Kez/kez-chat/document.md
Tudisco a1d1aa6983 plan(kez-chat): add web app design — Svelte SPA served by chat-server
The test UI is a Svelte 5 + TypeScript + Vite + Tailwind single-page
app served as static files by kez-chat-server. The web app uses the
exact same HTTP API a native client would use, so every action in the
UI dogfoods the API contract.

Architecture changes:

- kez-chat-server now serves `/` as the SPA (tower-http ServeDir)
  alongside the existing /v1 API
- Web app talks NATS over WebSocket (nats.ws + nats-server's
  built-in websocket transport — same auth callout, same nkey auth,
  same JetStream durable consumers)
- Web app cannot do Iroh: browsers can't open raw UDP sockets and
  Iroh's WebTransport story isn't ready in 2026. Web shows manifests
  and prompts "Download requires CLI" for actual file transfer.
- Key storage in browser: passphrase-encrypted IndexedDB (documented
  limitation — native clients use OS keychain)

New / updated sections in document.md:

- §1: opening pitch mentions the web app + that it dogfoods the API
- §4.1: responsibilities table adds "serves the test web app"
- §4.4 NEW: full design of the web app — stack, capabilities, what
  it can't do in v0, deployment model
- §4.5: endpoint list now includes / (the SPA) and /assets/*
- §4.3: nats.conf snippet enables WebSocket transport alongside the
  existing native NATS port; both transports hit the same auth
  callout
- §5.4: file-sharing flow notes the web app caveat (visible manifest,
  CLI required for actual download)
- §6.1: folder layout adds web/ subdirectory with Svelte/Vite/Tailwind
  scaffolding and an updated Dockerfile (multi-stage: build web →
  build rust → ship)
- §6.3: dependencies split into Rust server vs Web app sections.
  Web app pulls in svelte, typescript, vite, nats.ws, @noble/curves,
  @scure/base, canonicalize, svelte-spa-router, tailwindcss,
  idb-keyval.
- §7 MVP scope: full Web app checklist added; CLI section renamed
  and clarified ("same Rust core powers CLI and future native GUI")
- §8 out-of-scope: "file transfer from the browser" added
- §11 sequenced plan: split into 12 steps; new phases 7-10 are the
  web app build (scaffold → account/contacts → chat → manifest);
  step 12 deferred native GUI
- §12 summary: rewritten to reflect "two Rust services + a Svelte
  web app + a CLI"
- Decisions-locked table: added rows for test UI choice, browser
  file transfer, manifest format, frontend framework, in-browser
  key storage
2026-05-24 23:10:48 -06:00

972 lines
43 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# KEZ Chat & File Share — Design Document
**Status:** Pre-implementation planning. No code yet.
**Last updated:** 2026-05-24
**Goal:** A Keybase-class chat + file sharing experience built on the KEZ
identity stack, with NATS for messaging and Iroh for file transfer.
---
## 1. What this is
A real-time chat + file-sharing application with verified identities.
- Users get human-friendly handles like `tudisco@kez.lat`
(email-style — no leading `@`; that's mention syntax in chat, not
part of the handle itself, see §3.1).
- The handle is bound to a KEZ ed25519 primary key; the same key
authenticates to the chat infrastructure.
- Conversations are end-to-end encrypted; the broker is dumb.
- Files are visible in the sender's "shared files" list but only
downloaded when a recipient actually wants them. No background sync.
- Identity is portable: the underlying key + sigchain survives the home
server going dark. Handles can be migrated to other servers later.
- A **Svelte web app** served directly by `kez-chat-server` is the test
UI. It uses the same HTTP API any native client would use, so the
web app dogfoods the API. See §4.4.
This is the Keybase model rebuilt on a decentralized substrate:
- **Identity layer** → KEZ (instead of Keybase's central account system)
- **Chat layer** → NATS with client-side E2E (instead of Keybase Chat)
- **File layer** → Iroh peer-to-peer with content addressing (instead of KBFS)
---
## 2. Three-layer architecture
```
┌─────────────────────────────────────────────────────────────┐
│ kez-chat application │
│ (chat UI, file browser, profile views) │
└────┬──────────────┬─────────────────────┬───────────────────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌────────────────┐
│ KEZ │ │ NATS │ │ Iroh │
│ │ │ │ │ │
│ ↓ who │ │ ↓ chat │ │ ↓ file blobs │
│ ↓ what │ │ ↓ tickets│ │ ↓ on-demand │
│ they │ │ ↓ presence │ ↓ NAT travers. │
│ own │ │ ↓ small │ │ ↓ E2E in QUIC │
│ ↓ where │ │ stuff │ │ │
│ they │ │ │ │ │
│ listen│ │ dumb │ │ │
│ │ │ broker; │ │ │
│ │ │ clients │ │ │
│ │ │ do E2E │ │ │
└─────────┘ └──────────┘ └────────────────┘
│ ▲ ▲
└─────────── sigchain ──────────────────┘
(handle → KEZ primary → endpoints
and links to other identities)
```
Each layer does one thing well. Each is replaceable without touching the
others. The KEZ sigchain is the bridge that ties them together — it tells
a verifier "this user's broker is X, their Iroh nodes are Y₁ and Y₂."
---
## 3. Identity & username model
### 3.1 Handles
Handles look like email addresses — `local@server`, no leading `@`:
```
tudisco@kez.lat
chris@kez.lat
alice@chris.com ← custom domain, opted out of default (future)
```
**The leading `@` is mention syntax, not part of the handle.**
When a user writes "hey @tudisco, check this" in a chat message,
that's exactly like @-mentions in Slack/Twitter/Discord — the `@` is
a UI convention that says "this token is a person reference." The
underlying handle being referenced is `tudisco@kez.lat`.
Three forms in the system:
| Form | Where | Example |
|---|---|---|
| Storage / wire | Database, sigchain, registry lookups | `tudisco@kez.lat` (always fully qualified) |
| Display | UI, profile pages | `tudisco` when the server is the app's default; full `tudisco@kez.lat` when cross-server |
| Mention | Inside chat messages | `@tudisco` (chat-app convention; UI resolves to the full handle in context) |
`kez.lat` is the placeholder default home server domain. We'll lock in
the real production domain before launch.
For v0, **the handle namespace is global** — registration is on the one
default home server. Federation (multiple servers with their own
namespaces) is deliberately not in v0, but the design must not preclude
it. See §3.5.
In the UI, since there's only one home server in v0, handles are
displayed bare (`tudisco`). The `@kez.lat` suffix is implied. Storage
and the wire always use the fully-qualified form.
### 3.2 Key generation tied to username
When a user creates an account:
1. App generates a **fresh ed25519 keypair** locally.
- This is the user's KEZ primary key.
- It's also their NATS nkey for the chat broker (same key, same algorithm).
- It's also their Iroh node identity (same primitive again).
2. App **registers `@username`** on the home server's handle registry.
- Sends a signed registration request proving control of the private key.
- Registry rejects squatting (first-come-first-served).
3. App **initializes a sigchain** for the new primary.
- First event: `add_endpoint` advertising the NATS broker the app will use.
- Second event: `add_endpoint` advertising the Iroh NodeId of the local device.
4. App **uploads the sigchain** to the deployed `kez-sig-server`.
After this flow the user has a fully working KEZ identity:
- `tudisco@kez.lat` resolves via the handle registry to their primary key.
- That key's sigchain (on `kez-sig-server`) advertises their NATS broker and Iroh nodes.
- Other users can verify them and reach them.
### 3.3 Why ed25519 only for this app
Both KEZ primary types work in general, but the chat app **requires** ed25519:
- **NATS nkeys are ed25519.** Direct alignment: the user's KEZ primary key
is their NATS credential. No second auth scheme.
- **Iroh node IDs are ed25519.** Same primitive, native fit.
- **One key type to manage.** Users with a pre-existing nostr key can
still attach it to their KEZ sigchain as a verifiable claim (so they're
cross-referenced on nostr too), but the primary that runs the app is
ed25519. The nostr key never participates in chat or file transport.
### 3.4 Account recovery: paper backup (Keybase-style)
The user's ed25519 private key is the only thing that can prove their
identity. Lose it, lose the account.
Recovery model for v0:
- On account creation, the app converts the 32-byte ed25519 seed to a
**mnemonic phrase** (BIP-39 style, 24 words). Standard, well-tested
word lists, deterministic encoding.
- App **forces the user to write it down** before continuing — shows
the words, asks for confirmation, then asks them to retype a few
random words back to prove they recorded it.
- App stores the seed locally in OS-protected storage (Keychain,
Credential Manager, libsecret). Mnemonic is shown only at creation
and on-demand from settings.
- **Lost device flow:** user installs the app on a new device, types
their mnemonic, app regenerates the same ed25519 keypair, then pulls
the sigchain from `kez-sig-server` to restore their identity state.
- The handle is still theirs because the registry knows the primary key.
No server-side recovery. No email reset. No customer support. Same model
Bitcoin wallets and Keybase used — user holds the seed phrase, user is
responsible for it.
### 3.5 Federation-ready design (not in v0)
For v0 we have **one** home server (`kez.lat`). All handles live there.
To make sure we don't paint ourselves into a corner:
1. **Internal representation of a handle is always the qualified form**
(`tudisco@kez.lat`), never just `tudisco`. The UI strips the suffix
for display; storage always keeps the full form.
2. **Handle resolution is HTTP-based**, not hard-coded. The chat app
looks up `chris@kez.lat` by hitting `https://kez.lat/v1/u/chris`.
When federation lands, looking up `chris@example.com` hits
`https://example.com/v1/u/chris` instead.
3. **WebFinger endpoint included from v0** — so cross-server discovery
already works via standard tooling, even if our app only uses our
own server for now.
4. **Sigchain endpoint URLs are fully qualified.** A user's sigchain
lives at `https://sig.kez.lat/v1/sigchains/ed25519/<hex>` — when
another server's user wants to verify ours, the URL is right there.
The v0 chat app might hard-code "lookups go to `kez.lat`" for now;
flipping that to "lookups go to whatever's after the `@`" is a config
change later, not a redesign.
---
## 4. The home server (`kez-chat-server`)
A single Rust binary, deployed as one container alongside other
microservices (NATS broker, sigchain server).
### 4.1 What it does (and what it doesn't)
| Responsibility | This server? |
|---|---|
| **Handle registry** | ✅ Yes |
| **NATS auth callout** | ✅ Yes |
| **WebFinger endpoint** | ✅ Yes |
| **HTTP API for clients** | ✅ Yes |
| **Serves the test web app** (Svelte SPA, built into the binary) | ✅ Yes (§4.4) |
| **Sigchain storage** | ❌ No — defer to `kez-sig-server` (separate container) |
| **NATS broker** | ❌ No — separate `nats-server` (Go) container |
| **Iroh pinning** | ❌ No for v0 — files transfer P2P when both peers are online. Pinning is a future tier. |
| **Channel verification (gist/dns/etc.)** | ❌ No — clients do it locally via `kez-channels`. KEZ system is only used for identity, not as part of chat. |
The chat server is deliberately small. Microservices: each service does
one thing, deployed independently. Operator runs three containers
(chat-server + nats-server + sig-server). When pinning lands later, that
becomes a fourth optional container.
### 4.2 Process / deployment model
NATS is **not embedded in our Rust code** — it's a separate process
(the official Go `nats-server`). But we **do bundle it in our deployment
recipe** so operators get a turn-key setup. Same pattern as projects
that ship docker-compose with Postgres included: we don't write the
database, but we wire it up so you can `docker compose up` and have
everything working.
```
┌────────────────── our deployment (docker-compose) ────────────────┐
│ │
│ ┌──────────────┐ ┌─────────────────┐ ┌────────────────┐ │
│ │ nats-server │ │ kez-chat-server │ │ kez-sig-server │ │
│ │ (Go) │◄──┤ (Rust) ├──►│ (Rust) │ │
│ │ + JetStream │ │ │ │ (existing) │ │
│ │ │ │ ↓ handles │ │ ↓ sigchain │ │
│ │ ↓ chat msgs │ │ ↓ nats auth │ │ storage │ │
│ │ ↓ tickets │ │ ↓ HTTP API │ │ │ │
│ └──────────────┘ └─────────────────┘ └────────────────┘ │
│ ▲ ▲ ▲ │
└─────────┼───────────────────┼──────────────────────┼──────────────┘
│ │ │
┌──────┴───────────────────┴──────────────────────┴───────────┐
│ Chat app (per user, runs on phone/desktop) │
│ │
│ • talks to nats-server over native NATS protocol │
│ • talks to kez-chat-server over HTTPS │
│ • talks to kez-sig-server over HTTPS │
│ • runs local iroh::Node for file send/receive │
└──────────────────────────────────────────────────────────────┘
```
The chat-server orchestrates auth between NATS and the handle registry.
NATS runs in its own container; we ship the config wired up.
**Operators who already run NATS** can disable our bundled `nats`
service and point `NATS_URL` at their own broker — same auth_callout
config snippet works in any NATS deployment. Bundled NATS is the
default for convenience, not a requirement.
### 4.3 docker-compose recipe
```yaml
# deploy/docker-compose.yml
services:
nats:
image: nats:latest
command: ["-c", "/etc/nats/nats.conf", "--jetstream"]
volumes:
- ./nats.conf:/etc/nats/nats.conf:ro
- nats-data:/data
ports:
- "4222:4222" # client connections (TLS in prod)
- "8222:8222" # monitoring
chat-server:
build: . # kez-chat-server Rust binary
environment:
NATS_URL: nats://nats:4222
SIG_SERVER_URL: http://sig-server:7878
DB_PATH: /data/handles.db
AUTH_CALLOUT_NKEY_PATH: /etc/kez/auth-callout.nkey
volumes:
- chat-data:/data
- ./auth-callout.nkey:/etc/kez/auth-callout.nkey:ro
depends_on: [nats, sig-server]
ports:
- "8080:8080" # HTTP API for clients
sig-server:
image: kez-sig-server:latest # the existing rust-sig-server
environment:
KEZ_DB: /data/sigchains.db
volumes:
- sig-data:/data
ports:
- "7878:7878"
volumes:
nats-data:
chat-data:
sig-data:
```
We ship a reference `deploy/nats.conf` with the auth_callout wired up
to talk to our chat-server. Operators who want to bring their own
NATS:
1. Comment out (or delete) the `nats` service from the compose file.
2. Set `NATS_URL=nats://your-broker:4222` in the chat-server's env.
3. Apply our reference `nats.conf` snippet to their NATS deployment.
The auth_callout config snippet (and WebSocket for browser clients):
```conf
# nats.conf — patched into whichever NATS deployment is used
# Enable WebSocket transport so the browser SPA can connect.
# Native CLI clients use the standard NATS port (4222).
websocket {
port: 8443
no_tls: false # behind a TLS terminator in prod
}
authorization {
auth_callout {
issuer: "<our auth-callout signing nkey public part>"
auth_users: ["AUTHUSER"] # placeholder identity NATS uses
account: "DEFAULT"
}
}
```
The chat-server signs auth-callout responses with a long-lived nkey
that NATS trusts. When a client connects to NATS with their KEZ
ed25519 key — whether via native protocol (CLI) or WebSocket
(browser) — NATS forwards the auth request to our chat-server,
which checks the handle registry and signs a yes/no response. Same
auth path for both transports.
### 4.4 The test web app
The chat-server serves a Svelte single-page app as static files under
`/`. The web app is the test UI for the project — and crucially, it
**uses the exact same HTTP API a native client would use.** No
backend-rendered pages, no server-side state for the SPA. Every action
in the web UI goes through the public `/v1/...` API, which means the
web app is also a continuous test that the API contract works end to
end.
```
Browser hits https://kez.lat/ → SPA HTML+JS+CSS
SPA calls https://kez.lat/v1/u/chris → handle lookup (same as CLI)
SPA opens wss://kez.lat/nats → NATS broker over WebSocket
SPA calls https://sig.kez.lat/v1/sigchains/... → fetch sigchain (same as CLI)
```
#### Stack
| Layer | Pick |
|---|---|
| Framework | **Svelte 5 + TypeScript** |
| Build | **Vite** (output: static files served by chat-server) |
| Routing | **`svelte-spa-router`** (hash routing — works under any subpath) |
| NATS client | **`nats.ws`** — the official NATS WebSocket client. nkey auth supported. |
| Crypto | **`@noble/curves`** + **`@noble/hashes`** (same primitives our Node port uses) |
| Key storage | **passphrase-encrypted IndexedDB.** User enters a passphrase on first launch; seed is encrypted with that key. Documented limitation: browsers don't have an equivalent of OS keychain. Native clients (CLI, future Tauri) get better protection via OS keychain. |
| State | **Svelte stores** (built-in; no Redux needed) |
| Styling | **Tailwind** (default; trivially swappable) |
#### What the web app can do (v0)
- Account creation: generate ed25519 key in-browser, display mnemonic
for paper backup, register handle via `POST /v1/register`, upload
sigchain endpoint events to `kez-sig-server` via HTTP.
- Contacts: look up handles, fetch sigchains, display verified
identities (uses the same channel-verification logic via in-browser
TypeScript, sharing `@kez/core` and `@kez/channels`).
- 1:1 chat: subscribe to NATS inbox over WebSocket, decrypt incoming,
encrypt + publish outgoing. Real-time messaging works.
- Manifest browse: fetch and decrypt `@chris`'s shared-files manifest;
display the list of files; show metadata.
- Identity verification view: show the user's sigchain visually (claims,
channel proofs, rotations).
- Settings: show/re-display the mnemonic, re-verify it, log out
(clears IndexedDB).
#### What the web app can't do (v0)
- **File download / upload via Iroh.** Browsers can't open raw UDP
sockets, and Iroh's WebTransport story isn't ready in 2026. The
web app shows the manifest entries and a "Download (requires
desktop client)" button that points at the CLI. v0.5 may revisit
if Iroh-in-browser matures.
- Anything that requires the OS keychain (proper offline crypto).
#### Deployment
The web app's static build output (e.g. `kez-chat-server/web/dist/`)
is bundled into the chat-server's Docker image at build time and
served by axum's `ServeDir`. No separate static host, no separate
CDN. `docker compose up` deploys the SPA along with everything else.
The build pipeline:
1. `cd web && npm install && npm run build` → produces `web/dist/`
2. Dockerfile copies `web/dist/` into the runtime image
3. Rust binary serves it as `GET /*` (with the API mounted at `/v1/...`)
For dev: `npm run dev` runs Vite's dev server on port 5173, proxying
`/v1` requests to the locally-running chat-server. Hot module reload
works as normal Svelte dev.
### 4.5 Endpoints
```
GET / the web app (Svelte SPA)
GET /assets/* SPA static assets (CSS, JS, images)
GET /v1/healthz
GET /v1/u/:handle handle → { primary, sigchain_url, endpoints }
POST /v1/register claim a handle (signed body)
GET /.well-known/webfinger?resource=acct:tudisco@kez.lat
# NATS auth callout (called BY nats-server, not by users)
POST /internal/nats/auth verify nkey signature, return permissions
```
Sigchain endpoints are **not** on this server — both web and native
clients talk directly to `kez-sig-server` for those.
---
## 5. End-to-end flows
### 5.1 Account creation — `tudisco@kez.lat`
```
1. User opens chat app, clicks "Create account"
2. App: generates ed25519 keypair locally
3. App: converts seed to 24-word mnemonic, makes user write it down,
verifies recall before continuing
4. App: user picks handle "tudisco"
5. App → chat-server:
POST /v1/register
{ "handle": "tudisco",
"primary": "ed25519:<pubkey-hex>",
"registration_sig": "<sig over canonical message>" }
6. Server: validates signature, checks handle is free, stores in registry
7. Server: 201 Created
8. App: initializes sigchain locally, signs:
- add_endpoint { protocol: "nats", url: "...", inbox: "kez.inbox.<pk>" }
- add_endpoint { protocol: "iroh", node_id: "<local iroh id>" }
9. App → sig-server: POST /v1/sigchains/ed25519/<pk>/events (one per event)
10. App: connects to nats-server with nkey auth (signed challenge,
nats-server invokes chat-server's auth callout, gets back yes/no
+ allowed subjects)
11. App: subscribes to JetStream durable consumer on its inbox subject
12. Done — `tudisco@kez.lat` is live and reachable
```
### 5.2 Adding a contact
```
1. Tudisco types "@chris" in app
2. App → chat-server: GET /v1/u/chris
Returns: { primary: "ed25519:abc...", sigchain_url: "https://sig.kez.lat/..." }
3. App → sig-server (URL from above): fetch sigchain
4. App walks events to extract:
- NATS broker URL + inbox subject (from add_endpoint nats)
- Iroh node IDs (from add_endpoint iroh)
- Other identity claims (github:chris, dns:chris.com, etc. — for display)
5. App caches LOCALLY: { "chris@kez.lat" => ed25519:abc..., endpoints: {...} }
(TOFU — trust on first use)
```
### 5.3 Sending a chat message
```
1. Tudisco types "hello" to Chris
2. App looks up Chris's primary key + NATS endpoint from local cache
3. App derives a per-message symmetric key:
X25519(tudisco_priv, chris_pub) → HKDF → 32-byte ChaCha20-Poly1305 key
4. App encrypts "hello" with that key (+ random nonce)
5. App signs ciphertext with tudisco's KEZ primary
6. App publishes to subject `kez.inbox.<chris-pubkey-hex>` on the NATS
broker, JetStream-published so the broker stores it durably
7. Chris's app (subscribed via durable consumer) receives the message
whenever next online — broker buffers it if offline
8. Chris's app verifies signature against tudisco's key, decrypts,
shows "tudisco: hello"
```
The broker sees:
- An nkey-authenticated client publishing encrypted bytes to a subject
- It does NOT see: who's reading the subject, message contents, sender
identity (sender identity is in the signed payload, not the NATS frame)
### 5.4 Sharing a file (v0: both peers online)
```
1. Tudisco drags `report.pdf` into the chat with Chris
2. App imports the blob into local Iroh node → BLAKE3 hash + ticket
3. App optionally adds an entry to tudisco's "shared files" manifest
(visible if Chris later browses tudisco's profile)
4. App generates a per-file symmetric content key
5. App encrypts the blob in place (or stores both plaintext + encrypted —
detail for later) with the content key
6. App wraps the content key for chris's KEZ key (X25519 → HKDF)
7. App sends a NATS message to chris's inbox:
{ type: "file_share",
filename: "report.pdf",
size: 1234567,
iroh_ticket: "blobac://...",
wrapped_content_key: "..." }
(same encryption as chat messages, so chris can read this)
8. Chris's app sees the notification: "tudisco shared report.pdf (1.2 MB)"
File NOT downloaded yet.
9. Chris clicks Download.
10. Chris's app opens an Iroh connection to tudisco's NodeId (from
tudisco's sigchain), pulls the blob via the ticket, decrypts with
the unwrapped content key, verifies BLAKE3 hash. File appears.
```
**v0 limitations:**
1. If tudisco is offline at step 10, chris waits. Iroh will retry;
download starts when tudisco's node comes back. Pinning (the
server holding a copy) is **not** in v0 — we accept this
limitation in exchange for zero server-side storage cost and
the simplest possible architecture.
2. **The browser SPA can do steps 17 (sender side) and step 8
(notification + manifest entry visible), but cannot do step 10
(fetch the blob)** — browsers can't speak native Iroh. Web users
see "File available, open in CLI to download." Native CLI does
the whole flow. v0.5 may revisit when Iroh's WebTransport story
matures.
### 5.5 Browsing someone's files (Keybase-style)
```
1. Chris opens tudisco's profile
2. App resolves @tudisco → primary → sigchain
3. Sigchain has a `set_shared_files` op pointing at a manifest blob hash
4. App fetches the manifest blob via Iroh (small, fast)
5. App decrypts entries wrapped for chris's key, ignores ones it can't
decrypt (those are wrapped for other people)
6. App renders the visible entries: name, size, share date,
thumbnail (if present)
7. Chris clicks an entry → flow continues like §5.4 step 9
```
Manifest is small (KB-scale); blobs are MB-to-GB. Browsing is cheap;
fetching is per-file deliberate. **Recipient never auto-syncs.**
---
## 6. Project & folder layout
### 6.1 Where this project lives
```
/Kez
├── rust-lib/ ← (proposed refactor) shared Rust libraries
│ ├── Cargo.toml workspace
│ └── crates/
│ ├── kez-core/ moved from rust/crates/
│ └── kez-channels/ moved from rust/crates/
├── rust/ ← Rust CLI (kez binary)
│ └── crates/
│ └── kez-cli/ depends on ../../rust-lib/crates/...
├── rust-sig-server/ ← existing sigchain storage (reused as-is)
├── kez-chat/ ← THIS PROJECT
│ ├── document.md (this file)
│ ├── Cargo.toml
│ ├── src/ Rust server
│ │ ├── main.rs binary entry
│ │ ├── handles.rs handle registry (sqlite-backed)
│ │ ├── nats_auth.rs NATS auth callout endpoint
│ │ ├── webfinger.rs WebFinger discovery endpoint
│ │ ├── static_files.rs serves the built web app (axum::ServeDir)
│ │ └── api.rs axum routes + state
│ ├── web/ Svelte web app (the test UI)
│ │ ├── package.json
│ │ ├── vite.config.ts
│ │ ├── svelte.config.ts
│ │ ├── tailwind.config.ts
│ │ ├── src/
│ │ │ ├── routes/ pages (login, contacts, chat, profile, settings)
│ │ │ ├── lib/ crypto, nats client, sigchain helpers
│ │ │ └── app.svelte
│ │ └── dist/ build output (copied into Docker image)
│ ├── deploy/
│ │ ├── docker-compose.yml chat-server + nats + sig-server
│ │ ├── nats.conf with auth_callout + websocket config
│ │ ├── Dockerfile multi-stage: build web → build rust → ship
│ │ └── systemd/ alternative deployment
│ └── tests/
│ └── http.rs integration tests
├── nodejs/ ← (unchanged)
└── crosstest.sh ← (path updates if rust-lib moves)
```
### 6.2 The `rust-lib/` proposal — share code, no duplication
Right now, `kez-core` and `kez-channels` live inside `rust/crates/`. The
sig-server and the new chat-server both want to use them. Today's
downstream path-dep is:
```toml
kez-core = { path = "../rust/crates/kez-core" }
```
…which works but reaches into another project's crate tree.
**Recommendation:** move the pure libraries out into a top-level
`rust-lib/` workspace. The CLI stays in `rust/`. Downstream servers
depend on `../rust-lib/crates/kez-core`. Clean structure, no
duplication, no confusion about which folder owns what.
Refactor steps:
- `mv rust/crates/kez-core rust-lib/crates/`
- `mv rust/crates/kez-channels rust-lib/crates/`
- Create `rust-lib/Cargo.toml` (workspace).
- Update `rust/Cargo.toml` to have just kez-cli.
- Update path deps in: `rust/crates/kez-cli/Cargo.toml`,
`rust-sig-server/Cargo.toml`.
- Update `crosstest.sh` if any paths are hardcoded.
**Suggested order:** do the refactor *before* starting kez-chat so we
import cleanly from the start.
### 6.3 Dependencies (planned)
**Rust server (`kez-chat-server`):**
| Crate | Why |
|---|---|
| `kez-core` (path) | Identity types, ed25519, signing |
| `kez-channels` (path) | Verify users' linked accounts when displayed |
| `axum` 0.8 | HTTP API |
| `tokio` | Async runtime |
| `rusqlite` (bundled) | Handle registry |
| `async-nats` | NATS client (admin work + the auth callout glue) |
| `serde` / `serde_json` | Standard |
| `thiserror` / `anyhow` | Standard |
| `tracing` / `tracing-subscriber` | Logging |
| `tower-http` | CORS, request tracing, **`fs` feature for serving the SPA static files** |
| `clap` | CLI args |
**Not** depended on by the chat-server:
- `iroh` — server doesn't run an Iroh node in v0 (no pinning)
- nats-server (Go) — separate container, not a Rust dep
**Web app (`web/`):**
| Package | Why |
|---|---|
| `svelte` 5.x | Framework |
| `typescript` | Types |
| `vite` | Build + dev server |
| `nats.ws` | NATS client over WebSocket (browser-native NATS protocol) |
| `@noble/curves`, `@noble/hashes` | Same crypto primitives used in our Node port |
| `@scure/base` | bech32 (nsec/npub if needed), base64url |
| `canonicalize` | RFC 8785 JCS — for signature interop with native clients |
| `svelte-spa-router` | Hash-based routing |
| `tailwindcss` | Styling (default; trivially swappable) |
| `idb-keyval` | Tiny IndexedDB wrapper for the encrypted seed + cache |
### 6.4 NATS broker — bundled in compose, not in code
NATS is **not embedded in the Rust binary** — it's the official Go
`nats-server` running as its own container. But we **do include it
in the docker-compose deployment** so `docker compose up` is the
whole setup for new operators. Same pattern as projects shipping
Postgres-in-compose: it's bundled for convenience, not because we
wrote a database.
What we ship:
- `deploy/docker-compose.yml` with a `nats` service alongside our
Rust services
- `deploy/nats.conf` — reference config with auth_callout wired up
- `async-nats` client inside chat-server for admin/utility work
- The auth-callout HTTP endpoint chat-server exposes for NATS to call
What NATS we require (whether bundled or BYO):
| Requirement | Why |
|---|---|
| **NATS 2.10+** (for auth_callout) | We use auth_callout to bridge KEZ identity into NATS |
| **JetStream enabled** | For offline message buffering (durable consumers) |
| **TCP reachable** from chat-server and clients | Standard |
| **TLS** (in production) | Standard |
| **auth_callout configured** to hit our chat-server endpoint | Required for client auth |
**Swapping in your own NATS** is a config change, not a code change:
disable the bundled `nats` service in the compose, set `NATS_URL` to
your own broker, apply our `nats.conf` snippet there. Useful for
operators with existing NATS infrastructure, Synadia Cloud users, etc.
Why bundled rather than embedded:
- NATS is a 200KLOC Go service with its own ops story. We're not
rewriting it in Rust just to embed it.
- Bundling it as a separate process keeps the architecture honest —
if NATS misbehaves, it's a separate process to restart, debug, log.
- Operators can swap to a different broker deployment without touching
our code.
### 6.5 Iroh — client-side only
Clients run a local Iroh node for sending and receiving files. The
**chat-server does NOT run an Iroh node** in v0.
Implication: when @tudisco shares a file with @chris, the bytes go
directly from tudisco's device to chris's device via Iroh. If tudisco
is offline, chris waits. There's no fallback to a server-stored copy.
This is the simplest possible operational model. Pinning (server-side
fallback storage) is a future addition (§8).
---
## 7. MVP scope
### Server (`kez-chat-server`)
- [ ] HTTP API scaffold (axum + tokio)
- [ ] Handle registry (POST /register, GET /u/:handle)
- [ ] Registration signature validation (uses kez-core)
- [ ] WebFinger endpoint
- [ ] NATS auth callout (POST /internal/nats/auth)
- [ ] Static-file serving for the SPA (`tower-http` `ServeDir`)
- [ ] Healthz / metrics
- [ ] Integration tests against real nats-server + sig-server in a
test docker-compose
### Web app (`web/`)
- [ ] Project scaffold (Svelte 5 + Vite + TypeScript + Tailwind)
- [ ] Account creation flow (key gen in-browser, mnemonic prompt,
registration POST, sigchain upload)
- [ ] Login flow (mnemonic in → derive key → unlock IndexedDB)
- [ ] Contacts list (handle lookup, sigchain fetch + display)
- [ ] 1:1 chat (NATS-over-WebSocket subscribe/publish, E2E encrypt/decrypt)
- [ ] Manifest browse (fetch from sigchain → list entries)
- [ ] "Download requires CLI" affordance on manifest entries
- [ ] Identity verification view (visualize the sigchain)
- [ ] Settings (re-show mnemonic, verify it, log out)
- [ ] Build script integrated into the chat-server Docker image
### CLI client (`kez-chat-cli`)
Same Rust core powers both the CLI and (later) a native GUI. CLI gets
the **file** capabilities the web app can't have:
- [ ] Account creation (key gen + mnemonic backup + handle registration)
- [ ] Contact lookup + verification
- [ ] Send / receive 1:1 chat messages (E2E via NATS native)
- [ ] Send / receive files (E2E via Iroh)
- [ ] Browse `<handle>` shared-files manifest + download files
### Deployment
- [ ] docker-compose.yml (chat-server [includes SPA] + nats + sig-server)
- [ ] nats.conf with auth_callout + websocket configured
- [ ] Multi-stage Dockerfile (build web → build rust → final image)
- [ ] systemd alternative deployment recipe
- [ ] README with TLS / reverse proxy guidance (Caddy recommended)
---
## 8. Out of scope (v0)
- **Iroh pinning** (sender must be online for receiver to fetch)
- **File transfer from the browser** (the web app can browse manifests
but file download/upload needs the CLI; browsers can't speak Iroh
natively in 2026)
- **Group chat** (only 1:1 for v0)
- **Forward secrecy / ratcheting** (Double Ratchet, MLS) — chat is
encrypted but each message uses the same X25519-derived key per pair
- **Voice / video calls**
- **Multi-device key sync** — one device per user in v0
- **Account recovery beyond mnemonic** — paper backup is the only recovery
- **Federation across home servers** — one server (kez.lat) in v0;
design preserves the option
- **Channel-based identity verification** — the CLI already does
`kez verify id ...`; not duplicated in the chat-server. Users add
KEZ channel proofs (gist, dns, etc.) via the existing CLI separately.
- **Avatars / display names** — defer the design. For v0 the UI shows
the handle and that's enough.
---
## 9. The one remaining open question
**Manifest format** for "@chris's shared files":
| Option | How | Tradeoff |
|---|---|---|
| **A. Signed JSON blob, hash in sigchain** | Manifest is a JSON blob stored on Iroh. A new sigchain op `set_shared_files` commits the latest manifest hash. Recipients walk the sigchain → find the pointer → fetch the manifest blob from Iroh. | Simpler. No Iroh Docs dep. Sigchain anchors the version (signed). Update = new sigchain event. |
| **B. Iroh Doc** | Manifest is a mutable CRDT document. Recipients subscribe; updates sync in near-real-time. | Fancier UX (live updates). Requires Iroh Docs subsystem (heavier dep, less stable). |
**Recommended default: A.** Simpler, fewer moving parts, reuses
primitives we already have. We can upgrade to B later if real users
need real-time profile feed updates.
Settle yes/no on this and the design is locked.
---
## Decisions locked from earlier discussion
| Question | Decision |
|---|---|
| Bundle sigchain in chat-server? | **No.** Use existing `kez-sig-server`. Microservices. |
| Bundle NATS into Rust server? | **Not in the Rust code** — NATS stays the official Go `nats-server` running as its own process. **Yes in our docker-compose** — operators get `nats + chat-server + sig-server` wired up out of the box. Operators with existing NATS deployments can disable the bundled service and set `NATS_URL` to point elsewhere. |
| KEZ + nostr coexistence for chat? | **No nostr in chat.** KEZ is identity-only; nostr only as a verifiable claim in someone's sigchain, not as transport. |
| Handle scope: federation or global? | **Global for v0**, federation-ready design (see §3.5). |
| Recovery if key lost? | **Paper backup (24-word mnemonic), Keybase-style.** No server-side recovery. |
| Iroh pinning in v0? | **No.** Sender must be online for receiver to fetch. Pinning is a future tier. |
| Test UI: TUI / web / native GUI? | **Web app, served by `kez-chat-server` as static files.** Built in Svelte 5 + TypeScript + Vite + Tailwind. Uses the same HTTP API native clients use, so it dogfoods the contract. Talks NATS over WebSocket (`nats.ws`). |
| Browser file transfer? | **Not in v0.** Browsers can't speak Iroh natively in 2026. The web app shows manifests and prompts "Download requires CLI" for actual files. v0.5 revisit if Iroh's WebTransport story matures. |
| Manifest format? | **Option A** — signed JSON blob, hash committed via a new `set_shared_files` sigchain op. Simpler, reuses sigchain primitives. |
| Frontend framework? | **Svelte 5 + TypeScript + Vite**. Tailwind for styling (trivially swappable). |
| In-browser key storage? | **Passphrase-encrypted IndexedDB.** Documented limitation: browsers lack a Keychain equivalent. Native clients (CLI, future GUI) use OS keychain. |
---
## 10. Risks & honest concerns
1. **NATS auth callout integration depth.** Documented but fiddly.
nkey signature verification is straightforward; the per-user subject
permission glue needs care. Test cases for "user can publish to
their own inbox only" / "user can subscribe to their own inbox
only" matter.
2. **Iroh is pre-1.0.** Pin a version. Migration is a chore but only
touches client code, not identity. Identity stays stable (KEZ).
3. **Single-device assumption.** Real users have phones AND laptops.
v0 assumes one device per primary. Designing multi-device is a
real follow-up.
4. **No offline file delivery.** A natural user complaint will be
"Chris sent me a file but he's offline now." We've made the trade
knowingly; document the limitation clearly in-app ("File will
download when @chris is back online").
5. **Handle squatting.** First-come-first-served. Mitigations:
- Rate-limit registration by IP
- Reserve some handles (`@admin`, common project names)
- Accept that some squatting will happen; document the policy
6. **NAT traversal.** Iroh handles it with relays. Test on hostile
networks (corporate firewalls, mobile carriers with CGNAT) before
claiming "just works."
7. **Operational cost.** Three containers (chat + nats + sig-server)
+ bandwidth + a domain. Cheap at small scale, scales with users.
Need a "running kez.lat for 1k users — what does it cost?" answer
before community adoption.
---
## 11. The plan, sequenced
When we start building:
1. **Refactor: move `kez-core` + `kez-channels` to `rust-lib/`.**
Small but unblocks clean imports from kez-chat.
2. **Scaffold `kez-chat-server`** (axum + tokio + sqlite + tracing).
Handle registry + WebFinger first — these unblock client-side
account creation.
3. **NATS auth callout.** Wire up the `nats` service in our compose
(or, in dev, run `nats-server -c deploy/nats.conf --jetstream`
locally). Its auth_callout hits our chat-server's
`/internal/nats/auth`. End-to-end: a client can register a handle
and then connect to NATS authenticated by its KEZ key.
4. **Minimal `kez-chat-cli` client** (separate project) that does:
- `kez-chat register tudisco`
- `kez-chat add @chris`
- `kez-chat send @chris "hello"`
- `kez-chat listen`
No UI. Enough to prove the chat flow works end-to-end against
the server.
5. **Iroh integration in the CLI** (not the server).
- CLI runs a local Iroh node
- `kez-chat share @chris ./file.pdf`
- `kez-chat fetch <ticket>`
6. **Shared-files manifest.** New `set_shared_files` sigchain op.
`kez-chat browse @tudisco` lists his shared files.
7. **Web app scaffold** (Svelte 5 + Vite + Tailwind). Set up
`kez-chat/web/`, wire Vite dev proxy to the running chat-server,
"hello world" SPA served by axum's `ServeDir`. Multi-stage
Dockerfile builds `web/dist/` then bakes it into the runtime image.
8. **Web app: account + contacts + identity.** Account creation in
the browser (key gen, mnemonic backup, registration, sigchain
upload). Contacts list with sigchain-based verification. Identity
view (visualize the sigchain). No chat yet.
9. **Web app: chat.** `nats.ws` connection, nkey auth, subscribe to
inbox subject, encrypt/decrypt with `@noble/curves`. Real-time
chat in the browser. End-to-end: messages sent from CLI arrive
in the web app and vice versa.
10. **Web app: manifest browse.** Fetch and decrypt the
`set_shared_files` manifest of any contact; display the entries;
show "Download requires CLI" affordances on each.
11. **Deployment recipe finalized.** Production-ready docker-compose
(chat-server with embedded SPA + nats + sig-server), Caddy config
for TLS, systemd alternative.
12. **Then** native GUI (Tauri, etc.) — if web app + CLI isn't
enough. Likely v1 stretch, not MVP.
---
## 12. One-paragraph summary
`kez-chat` is a Keybase-class chat and file-sharing app built on the
KEZ identity stack. Users get `username@kez.lat` handles
(email-style; the leading `@` is mention syntax in chat, not part of
the handle) backed by an ed25519 primary key. The same key authenticates to a NATS broker
(chat, presence, file tickets — broker is dumb, clients do E2E with
ChaCha20-Poly1305 over X25519-derived keys) and identifies an Iroh
node (P2P bulk transfer, content-addressed blobs, on-demand fetch).
**Our project ships two Rust services + a Svelte web app + a CLI:**
`kez-chat-server` (handle registry + NATS auth callout + HTTP API +
serves the SPA), the existing `kez-sig-server` (sigchain storage),
the `web/` Svelte app (the test UI, served as static files by the
chat-server, uses the same HTTP API any native client would —
dogfoods the contract), and `kez-chat-cli` (Rust binary that's
also the scripted-test surface). NATS isn't in our Rust code — it's
the official Go binary running as its own container — but it's
wired up in our docker-compose so operators can `docker compose up`
and have everything working. Operators with existing NATS deployments
can disable the bundled service. The chat-server does not run an
Iroh node and does not pin files in v0; file transfer is pure P2P
between online peers, and **the browser can't speak Iroh natively —
so the web app shows manifests but file download requires the CLI**.
Account recovery is via a 24-word paper-backup mnemonic. Federation
across home servers is deferred but the design keeps it as a
flip-the-switch future change.
---
## Appendix A: running just NATS during development
The full deployment is `docker compose up` in `deploy/` — that brings
nats, chat-server, and sig-server together. But if you're iterating on
chat-server in `cargo watch` and want a standalone NATS to point at:
```sh
docker run -d --name kez-dev-nats \
-p 4222:4222 -p 8222:8222 \
-v "$PWD/deploy/nats.conf:/etc/nats/nats.conf:ro" \
nats:latest -c /etc/nats/nats.conf --jetstream
```
Point your locally-running chat-server at it with
`NATS_URL=nats://127.0.0.1:4222`. The auth_callout in the same
`nats.conf` will reach back to `http://host.docker.internal:8080/internal/nats/auth`.
Tear down with `docker rm -f kez-dev-nats` when done.