From f58612978702248d2472eeb34cf045d45aca02a8 Mon Sep 17 00:00:00 2001 From: Tudisco Date: Sun, 24 May 2026 22:37:08 -0600 Subject: [PATCH] plan(kez-chat): lock design decisions; rewrite document.md MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Sweep through the design doc with all the open questions resolved: - Microservices: chat-server does NOT bundle sigchain mirror — depends on the existing kez-sig-server as a separate container. - NATS: not embedded in the Rust server. nats-server (Go) runs as its own container; chat-server provides an auth callout endpoint that nats-server invokes on each client connection. - No nostr in chat. KEZ is identity-only; nostr only participates as a verifiable claim in someone's sigchain, not as transport. - Global handle namespace for v0, federation-ready design (qualified internal handles, HTTP-based lookups, WebFinger from day one). - Paper-backup recovery (24-word BIP-39-style mnemonic shown at account creation, user writes it down, app verifies recall). No server-side recovery. - No Iroh pinning in v0. Files transfer pure P2P; if sender is offline, receiver waits. Chat-server doesn't run an Iroh node at all. Concrete additions to the document: - §3.4 Paper-backup recovery flow - §3.5 Federation-ready design notes (qualified handle storage, HTTP-based lookups, WebFinger) - §4.1 Responsibility table now explicitly lists what's NOT in this server (sigchain, NATS, Iroh, channel verification) - §4.3 Sketch of docker-compose.yml showing the three-container microservices layout - §9 collapsed: only one open question remains (manifest format — signed blob via sigchain op vs Iroh Doc). Recommended default: A. - New "Decisions locked" table at the end of §9 summarizing all the closed questions - §5.4 file sharing flow notes "both peers online for v0" - §6.5 explicitly states "chat-server doesn't run an Iroh node" - §7 MVP scope trimmed (no Iroh pinning checkbox) - §11 sequenced plan reflects microservices ordering Ready to attack once the manifest format decision lands. --- kez-chat/document.md | 741 ++++++++++++++++++++++++------------------- 1 file changed, 418 insertions(+), 323 deletions(-) diff --git a/kez-chat/document.md b/kez-chat/document.md index 875e2cc..edbd406 100644 --- a/kez-chat/document.md +++ b/kez-chat/document.md @@ -12,18 +12,17 @@ identity stack, with NATS for messaging and Iroh for file transfer. A real-time chat + file-sharing application with verified identities. - Users get human-friendly handles like `@tudisco@kez.lat`. -- The handle is bound to a KEZ primary key (ed25519); the same key +- The handle is bound to a KEZ ed25519 primary key; the same key authenticates to the chat infrastructure. - Conversations are end-to-end encrypted; the broker is dumb. - Files are visible in the sender's "shared files" list but only downloaded when a recipient actually wants them. No background sync. - Identity is portable: the underlying key + sigchain survives the home - server going dark. Handles can be migrated to other servers. + server going dark. Handles can be migrated to other servers later. This is the Keybase model rebuilt on a decentralized substrate: - **Identity layer** → KEZ (instead of Keybase's central account system) -- **Chat layer** → NATS broker with E2E in the client (instead of Keybase - Chat servers) +- **Chat layer** → NATS with client-side E2E (instead of Keybase Chat) - **File layer** → Iroh peer-to-peer with content addressing (instead of KBFS) --- @@ -72,18 +71,20 @@ Handles look like email and Mastodon addresses: ``` @tudisco@kez.lat @chris@kez.lat -@alice@chris.com ← custom domain, opted out of default +@alice@chris.com ← custom domain, opted out of default (future) ``` -`kez.lat` is the placeholder default home server domain. We'll replace -this with the actual production domain once chosen. The application -treats whatever's after the `@` as the user's home server — multiple -servers can exist, federation is by convention (same model as email). +`kez.lat` is the placeholder default home server domain. We'll lock in +the real production domain before launch. -In the UI, when the home server matches the app's default, handles are -displayed bare (`@tudisco`). Custom domains always display the full form -(`@chris@chris.com`) so users can tell when they're talking to a -non-default-server user. +For v0, **the handle namespace is global** — registration is on the one +default home server. Federation (multiple servers with their own +namespaces) is deliberately not in v0, but the design must not preclude +it. See §3.5. + +In the UI, since there's only one home server in v0, handles are +displayed bare (`@tudisco`). The `@kez.lat` suffix is implied and stored +internally. ### 3.2 Key generation tied to username @@ -92,104 +93,200 @@ When a user creates an account: 1. App generates a **fresh ed25519 keypair** locally. - This is the user's KEZ primary key. - It's also their NATS nkey for the chat broker (same key, same algorithm). -2. App **registers `@username` on the home server's handle registry** - - POSTs a signed registration request: `{ "handle": "tudisco", "primary": "ed25519:" }` - - The signature proves the user controls the private key. - - The registry rejects squatting (first-come-first-served per home server). -3. App **initializes a sigchain** for the new primary + - It's also their Iroh node identity (same primitive again). +2. App **registers `@username`** on the home server's handle registry. + - Sends a signed registration request proving control of the private key. + - Registry rejects squatting (first-come-first-served). +3. App **initializes a sigchain** for the new primary. - First event: `add_endpoint` advertising the NATS broker the app will use. - - Second event: `add_endpoint` advertising the Iroh NodeId the local app is using. -4. App **uploads the sigchain** to a kez-sig-server (optional but - recommended; otherwise the chain lives only on the user's device). + - Second event: `add_endpoint` advertising the Iroh NodeId of the local device. +4. App **uploads the sigchain** to the deployed `kez-sig-server`. After this flow the user has a fully working KEZ identity: - `@tudisco@kez.lat` resolves via the handle registry to their primary key. -- That key's sigchain advertises their NATS broker and Iroh nodes. +- That key's sigchain (on `kez-sig-server`) advertises their NATS broker and Iroh nodes. - Other users can verify them and reach them. -### 3.3 Why ed25519 (not nostr/secp256k1) for this app +### 3.3 Why ed25519 only for this app -Both KEZ primaries work in general, but the chat app **must** use ed25519 -because: +Both KEZ primary types work in general, but the chat app **requires** ed25519: - **NATS nkeys are ed25519.** Direct alignment: the user's KEZ primary key is their NATS credential. No second auth scheme. - **Iroh node IDs are ed25519.** Same primitive, native fit. - **One key type to manage.** Users with a pre-existing nostr key can - still attach it to their KEZ sigchain as a claim (so they're verifiable - on nostr too), but the primary that runs the app is ed25519. + still attach it to their KEZ sigchain as a verifiable claim (so they're + cross-referenced on nostr too), but the primary that runs the app is + ed25519. The nostr key never participates in chat or file transport. + +### 3.4 Account recovery: paper backup (Keybase-style) + +The user's ed25519 private key is the only thing that can prove their +identity. Lose it, lose the account. + +Recovery model for v0: + +- On account creation, the app converts the 32-byte ed25519 seed to a + **mnemonic phrase** (BIP-39 style, 24 words). Standard, well-tested + word lists, deterministic encoding. +- App **forces the user to write it down** before continuing — shows + the words, asks for confirmation, then asks them to retype a few + random words back to prove they recorded it. +- App stores the seed locally in OS-protected storage (Keychain, + Credential Manager, libsecret). Mnemonic is shown only at creation + and on-demand from settings. +- **Lost device flow:** user installs the app on a new device, types + their mnemonic, app regenerates the same ed25519 keypair, then pulls + the sigchain from `kez-sig-server` to restore their identity state. +- The handle is still theirs because the registry knows the primary key. + +No server-side recovery. No email reset. No customer support. Same model +Bitcoin wallets and Keybase used — user holds the seed phrase, user is +responsible for it. + +### 3.5 Federation-ready design (not in v0) + +For v0 we have **one** home server (`kez.lat`). All handles live there. +To make sure we don't paint ourselves into a corner: + +1. **Internal representation of a handle is always the qualified form** + (`tudisco@kez.lat`), never just `tudisco`. The UI strips the suffix + for display; storage always keeps the full form. +2. **Handle resolution is HTTP-based**, not hard-coded. The chat app + looks up `chris@kez.lat` by hitting `https://kez.lat/v1/u/chris`. + When federation lands, looking up `chris@example.com` hits + `https://example.com/v1/u/chris` instead. +3. **WebFinger endpoint included from v0** — so cross-server discovery + already works via standard tooling, even if our app only uses our + own server for now. +4. **Sigchain endpoint URLs are fully qualified.** A user's sigchain + lives at `https://sig.kez.lat/v1/sigchains/ed25519/` — when + another server's user wants to verify ours, the URL is right there. + +The v0 chat app might hard-code "lookups go to `kez.lat`" for now; +flipping that to "lookups go to whatever's after the `@`" is a config +change later, not a redesign. --- ## 4. The home server (`kez-chat-server`) -A single Rust binary that bundles the home-server responsibilities. One -process. Self-hostable. Anyone can run their own to be their own home for -their own users. +A single Rust binary, deployed as one container alongside other +microservices (NATS broker, sigchain server). -### 4.1 What it does +### 4.1 What it does (and what it doesn't) -| Responsibility | How | +| Responsibility | This server? | |---|---| -| **Handle registry** | `POST /v1/register` to claim `@username`, `GET /v1/u/` to look one up. SQLite-backed. Same shape as `kez-id-server` discussed earlier. | -| **Sigchain mirror** (optional) | Mirrors `kez-sig-server` endpoints for users who don't want to publish elsewhere — `POST /v1/sigchains/.../events`, `GET /v1/sigchains/...`. Or proxies through to a separate `kez-sig-server` instance. | -| **NATS broker host** | Runs (or co-runs) a NATS server with JetStream enabled for offline message delivery. Configured to use nkey-based auth tied to KEZ primary keys. | -| **Iroh pinning node** | Runs an Iroh node that users can opt to push their blobs to, so files are served even when the user's own device is offline. (Optional per user.) | -| **WebFinger endpoint** | `/.well-known/webfinger?resource=acct:tudisco@kez.lat` returns user discovery info — interop with fediverse tools. | -| **HTTP API for clients** | Thin REST surface for the chat app to register, look up handles, fetch endpoints, manage settings. | +| **Handle registry** | ✅ Yes | +| **NATS auth callout** | ✅ Yes | +| **WebFinger endpoint** | ✅ Yes | +| **HTTP API for clients** | ✅ Yes | +| **Sigchain storage** | ❌ No — defer to `kez-sig-server` (separate container) | +| **NATS broker** | ❌ No — separate `nats-server` (Go) container | +| **Iroh pinning** | ❌ No for v0 — files transfer P2P when both peers are online. Pinning is a future tier. | +| **Channel verification (gist/dns/etc.)** | ❌ No — clients do it locally via `kez-channels`. KEZ system is only used for identity, not as part of chat. | -### 4.2 Process model +The chat server is deliberately small. Microservices: each service does +one thing, deployed independently. Operator runs three containers +(chat-server + nats-server + sig-server). When pinning lands later, that +becomes a fourth optional container. -For MVP, the server is a **coordinator + adapter**, not a full -reimplementation: +### 4.2 Process / deployment model ``` -┌───────────────────────────────────────────────────────────┐ -│ kez-chat-server process (one Rust binary) │ -│ - HTTP API (axum) │ -│ - Handle registry (SQLite) │ -│ - NATS auth callout (validates nkey signatures) │ -│ - Sigchain mirror (axum routes — could reuse │ -│ rust-sig-server code) │ -└──┬──────────────────────┬────────────────────────────────┘ - │ launches/manages │ talks to via API - ▼ ▼ -┌──────────────┐ ┌──────────────┐ -│ nats-server │ │ iroh-relay │ (optional, for users -│ (Go binary) │ │ (Rust) │ who want pinning) -│ + JetStream │ │ │ -└──────────────┘ └──────────────┘ +┌──────────────────────────────────────────────────────────────┐ +│ docker-compose / systemd / Kubernetes │ +│ │ +│ ┌──────────────┐ ┌─────────────────┐ ┌────────────────┐ │ +│ │ nats-server │ │ kez-chat-server │ │ kez-sig-server │ │ +│ │ (Go) │◄──┤ (Rust) ├──►│ (Rust) │ │ +│ │ + JetStream │ │ │ │ (existing) │ │ +│ │ │ │ ↓ handles │ │ ↓ sigchain │ │ +│ │ │ │ ↓ nats auth │ │ storage │ │ +│ │ │ │ ↓ HTTP API │ │ │ │ +│ └──────────────┘ └─────────────────┘ └────────────────┘ │ +│ ▲ ▲ ▲ │ +│ │ │ │ │ +└─────────┼───────────────────┼──────────────────────┼─────────┘ + │ │ │ + │ │ │ + ┌──────┴───────────────────┴──────────────────────┴─────┐ + │ Chat app (per user, runs on phone/desktop) │ + │ │ + │ • talks to nats-server over native NATS protocol │ + │ • talks to kez-chat-server over HTTPS (handles, etc.) │ + │ • talks to kez-sig-server over HTTPS (sigchain) │ + │ • runs local iroh::Node for file send/receive │ + └────────────────────────────────────────────────────────┘ ``` -The Rust server doesn't reimplement NATS or Iroh — it sits beside them. -Operator runs the three processes together (Docker compose, systemd -unit, or whatever). The chat-server provides the KEZ-aware integration: -authenticating NATS connections against the handle registry, serving -sigchain endpoints, exposing a clean HTTP API to client apps. +The Rust chat-server orchestrates auth between NATS and the handle +registry, but doesn't host either NATS or the sigchains. -### 4.3 Endpoints (sketch) +### 4.3 docker-compose sketch -``` -GET /v1/healthz -GET /v1/u/:handle handle → { primary, sigchain_url, endpoints } -POST /v1/register claim a handle (signed body) -GET /.well-known/webfinger?resource=... +```yaml +# deploy/docker-compose.yml +services: + nats: + image: nats:latest + command: ["-c", "/etc/nats/nats.conf", "--jetstream"] + volumes: + - ./nats.conf:/etc/nats/nats.conf:ro + - nats-data:/data + ports: + - "4222:4222" # client connections (TLS in prod) + - "8222:8222" # monitoring -# Sigchain mirror (same as kez-sig-server) -GET /v1/sigchains/:scheme/:id -POST /v1/sigchains/:scheme/:id/events -GET /v1/sigchains/:scheme/:id/head + chat-server: + build: . # kez-chat-server Rust binary + environment: + NATS_URL: nats://nats:4222 + SIG_SERVER_URL: http://sig-server:7878 + DB_PATH: /data/handles.db + AUTH_CALLOUT_NKEY_PATH: /etc/kez/auth-callout.nkey + volumes: + - chat-data:/data + - ./auth-callout.nkey:/etc/kez/auth-callout.nkey:ro + depends_on: [nats, sig-server] + ports: + - "8080:8080" # HTTP API for clients -# NATS auth callout (called by nats-server, not by users) -POST /internal/nats/auth verify nkey signature, return permissions + sig-server: + image: kez-sig-server:latest # the existing rust-sig-server + environment: + KEZ_DB: /data/sigchains.db + volumes: + - sig-data:/data + ports: + - "7878:7878" -# Iroh pinning (optional) -POST /v1/pin pin a blob for offline serving -GET /v1/pin/:hash check pinning status +volumes: + nats-data: + chat-data: + sig-data: ``` -The NATS broker and Iroh node are *out-of-process* — clients connect to -them directly (`mqtt://nats.kez.lat:4222`, Iroh direct or via relays). +NATS's auth-callout is configured in `nats.conf` to send connection +requests to `chat-server:8080/internal/nats/auth`. The chat-server +verifies the nkey signature against the handle registry and returns +allowed subjects (typically just the user's own inbox). + +### 4.4 Endpoints + +``` +GET /v1/healthz +GET /v1/u/:handle handle → { primary, sigchain_url, endpoints } +POST /v1/register claim a handle (signed body) +GET /.well-known/webfinger?resource=acct:tudisco@kez.lat + +# NATS auth callout (called BY nats-server, not by users) +POST /internal/nats/auth verify nkey signature, return permissions +``` + +Sigchain endpoints are **not** on this server — clients talk directly to +`kez-sig-server` for those. --- @@ -198,111 +295,114 @@ them directly (`mqtt://nats.kez.lat:4222`, Iroh direct or via relays). ### 5.1 Account creation — `@tudisco@kez.lat` ``` -1. User opens kez-chat-app, clicks "Create account" +1. User opens chat app, clicks "Create account" 2. App: generates ed25519 keypair locally -3. App: user picks handle "tudisco" -4. App → kez-chat-server: +3. App: converts seed to 24-word mnemonic, makes user write it down, + verifies recall before continuing +4. App: user picks handle "tudisco" +5. App → chat-server: POST /v1/register { "handle": "tudisco", "primary": "ed25519:", "registration_sig": "" } -5. Server: validates signature, checks handle is free, stores in registry -6. Server: 201 Created -7. App: initializes sigchain locally, signs: - { op: "add_endpoint", - payload: { protocol: "nats", - url: "nats://nats.kez.lat:4222", - inbox: "kez.inbox." } } - { op: "add_endpoint", - payload: { protocol: "iroh", - node_id: "" } } -8. App → server: - POST /v1/sigchains/ed25519//events (twice, one per event) -9. App: connects to NATS broker with nkey auth, subscribes to inbox topic -10. Done — user is @tudisco@kez.lat, online, reachable +6. Server: validates signature, checks handle is free, stores in registry +7. Server: 201 Created +8. App: initializes sigchain locally, signs: + - add_endpoint { protocol: "nats", url: "...", inbox: "kez.inbox." } + - add_endpoint { protocol: "iroh", node_id: "" } +9. App → sig-server: POST /v1/sigchains/ed25519//events (one per event) +10. App: connects to nats-server with nkey auth (signed challenge, + nats-server invokes chat-server's auth callout, gets back yes/no + + allowed subjects) +11. App: subscribes to JetStream durable consumer on its inbox subject +12. Done — @tudisco@kez.lat is live and reachable ``` ### 5.2 Adding a contact ``` -1. Tudisco wants to add Chris. Types "@chris" in app. -2. App → kez-chat-server: GET /v1/u/chris - Returns: { primary: "ed25519:abc...", sigchain_url: "..." } -3. App fetches the sigchain → walks events → extracts: - - nostr/github/dns/etc. claims (for verification) - - NATS broker URL + inbox topic - - Iroh node IDs -4. App displays Chris's profile: verified accounts, avatar (from sigchain - metadata if present), join date -5. App stores LOCAL binding: { "@chris@kez.lat" => ed25519:abc... } +1. Tudisco types "@chris" in app +2. App → chat-server: GET /v1/u/chris + Returns: { primary: "ed25519:abc...", sigchain_url: "https://sig.kez.lat/..." } +3. App → sig-server (URL from above): fetch sigchain +4. App walks events to extract: + - NATS broker URL + inbox subject (from add_endpoint nats) + - Iroh node IDs (from add_endpoint iroh) + - Other identity claims (github:chris, dns:chris.com, etc. — for display) +5. App caches LOCALLY: { "@chris@kez.lat" => ed25519:abc..., endpoints: {...} } (TOFU — trust on first use) ``` ### 5.3 Sending a chat message ``` -1. Tudisco types "hello" in the chat with Chris. -2. App: looks up Chris's primary key + NATS endpoint from local store. -3. App: derives a symmetric key via ECDH: - X25519(tudisco_priv, chris_pub) → KDF → 32-byte symmetric key -4. App: encrypts "hello" with ChaCha20-Poly1305 + the derived key. -5. App: signs the ciphertext with tudisco's KEZ primary (so chris can - verify the sender, not just decrypt). -6. App: publishes to NATS subject `kez.inbox.` on - chris's broker, with JetStream delivery (durable, will queue if - chris is offline). -7. Chris's app receives from his subscribed inbox subject. -8. Chris's app: verifies signature against tudisco's key, decrypts, shows - "tudisco: hello". +1. Tudisco types "hello" to Chris +2. App looks up Chris's primary key + NATS endpoint from local cache +3. App derives a per-message symmetric key: + X25519(tudisco_priv, chris_pub) → HKDF → 32-byte ChaCha20-Poly1305 key +4. App encrypts "hello" with that key (+ random nonce) +5. App signs ciphertext with tudisco's KEZ primary +6. App publishes to subject `kez.inbox.` on the NATS + broker, JetStream-published so the broker stores it durably +7. Chris's app (subscribed via durable consumer) receives the message + whenever next online — broker buffers it if offline +8. Chris's app verifies signature against tudisco's key, decrypts, + shows "tudisco: hello" ``` -For 1:1 chat, the broker never sees: -- The message contents -- Who tudisco is talking to (the subject is chris's inbox, but anyone could - publish there) -- The relationship between sender and recipient (sender's identity is in - the encrypted+signed payload, not in the NATS metadata) +The broker sees: +- An nkey-authenticated client publishing encrypted bytes to a subject +- It does NOT see: who's reading the subject, message contents, sender + identity (sender identity is in the signed payload, not the NATS frame) -### 5.4 Sharing a file +### 5.4 Sharing a file (v0: both peers online) ``` -1. Tudisco drags `report.pdf` into the chat with Chris. -2. App: imports blob into local Iroh node → gets BLAKE3 hash + ticket. -3. App: optionally adds entry to tudisco's shared-files manifest - (visible in his profile if Chris later browses it). -4. App: encrypts the Iroh ticket (and a content key for the blob, if - the file is wrapped with a per-recipient symmetric key) with the - same E2E mechanism as chat messages. -5. App: publishes to chris's NATS inbox: { type: "file_share", - filename: "report.pdf", ticket: "...", content_key: "..." } -6. Chris's app receives the notification, displays: - "tudisco shared report.pdf (1.2 MB)" [Download] -7. Chris clicks Download. -8. App: opens Iroh connection to tudisco's NodeId (from sigchain), pulls - the blob via the ticket, decrypts with the content key, verifies - BLAKE3 hash. File appears. +1. Tudisco drags `report.pdf` into the chat with Chris +2. App imports the blob into local Iroh node → BLAKE3 hash + ticket +3. App optionally adds an entry to tudisco's "shared files" manifest + (visible if Chris later browses tudisco's profile) +4. App generates a per-file symmetric content key +5. App encrypts the blob in place (or stores both plaintext + encrypted — + detail for later) with the content key +6. App wraps the content key for chris's KEZ key (X25519 → HKDF) +7. App sends a NATS message to chris's inbox: + { type: "file_share", + filename: "report.pdf", + size: 1234567, + iroh_ticket: "blobac://...", + wrapped_content_key: "..." } + (same encryption as chat messages, so chris can read this) +8. Chris's app sees the notification: "tudisco shared report.pdf (1.2 MB)" + File NOT downloaded yet. +9. Chris clicks Download. +10. Chris's app opens an Iroh connection to tudisco's NodeId (from + tudisco's sigchain), pulls the blob via the ticket, decrypts with + the unwrapped content key, verifies BLAKE3 hash. File appears. ``` -If tudisco is offline at step 8 and he's opted into pinning, Chris's -app fetches from `kez.lat`'s pinning node instead. Same protocol, just -a different source. +**v0 limitation:** If tudisco is offline at step 10, chris waits. +Iroh will retry; download starts when tudisco's node comes back. +Pinning (the server holding a copy) is **not** in v0 — we accept this +limitation in exchange for zero server-side storage cost and the +simplest possible architecture. ### 5.5 Browsing someone's files (Keybase-style) ``` -1. Chris opens tudisco's profile. -2. App: resolves @tudisco → primary → sigchain. -3. Sigchain has a `set_shared_files` op with a manifest blob hash. -4. App: fetches the manifest blob (small, fast) via Iroh. -5. App: decrypts entries that are wrapped for chris's key, ignores ones - it can't decrypt (those are wrapped for other people). -6. App: renders the visible entries with name, size, share date, - thumbnail if present. -7. Chris clicks an entry to download — same as 5.4 step 8. +1. Chris opens tudisco's profile +2. App resolves @tudisco → primary → sigchain +3. Sigchain has a `set_shared_files` op pointing at a manifest blob hash +4. App fetches the manifest blob via Iroh (small, fast) +5. App decrypts entries wrapped for chris's key, ignores ones it can't + decrypt (those are wrapped for other people) +6. App renders the visible entries: name, size, share date, + thumbnail (if present) +7. Chris clicks an entry → flow continues like §5.4 step 9 ``` -The manifest is **small** (KBs); only blobs Chris actually wants are -fetched. No background sync of multi-GB folders. +Manifest is small (KB-scale); blobs are MB-to-GB. Browsing is cheap; +fetching is per-file deliberate. **Recipient never auto-syncs.** --- @@ -312,7 +412,7 @@ fetched. No background sync of multi-GB folders. ``` /Kez -├── rust-lib/ ← (proposed) shared Rust libraries +├── rust-lib/ ← (proposed refactor) shared Rust libraries │ ├── Cargo.toml workspace │ └── crates/ │ ├── kez-core/ moved from rust/crates/ @@ -322,23 +422,23 @@ fetched. No background sync of multi-GB folders. │ └── crates/ │ └── kez-cli/ depends on ../../rust-lib/crates/... │ -├── rust-sig-server/ ← optional sigchain HTTP store +├── rust-sig-server/ ← existing sigchain storage (reused as-is) │ ├── kez-chat/ ← THIS PROJECT │ ├── document.md (this file) │ ├── Cargo.toml │ ├── src/ -│ │ ├── main.rs -│ │ ├── handles.rs handle registry -│ │ ├── sigchain.rs sigchain mirror (or proxy) -│ │ ├── nats_auth.rs NATS auth callout -│ │ ├── pin.rs Iroh pinning -│ │ └── api.rs HTTP routes +│ │ ├── main.rs binary entry +│ │ ├── handles.rs handle registry (sqlite-backed) +│ │ ├── nats_auth.rs NATS auth callout endpoint +│ │ ├── webfinger.rs WebFinger discovery endpoint +│ │ └── api.rs axum routes + state │ ├── deploy/ -│ │ ├── docker-compose.yml chat-server + nats + iroh -│ │ ├── nats.conf -│ │ └── systemd/ +│ │ ├── docker-compose.yml chat-server + nats + sig-server +│ │ ├── nats.conf with auth_callout config +│ │ └── systemd/ alternative deployment │ └── tests/ +│ └── http.rs integration tests │ ├── nodejs/ ← (unchanged) └── crosstest.sh ← (path updates if rust-lib moves) @@ -346,23 +446,22 @@ fetched. No background sync of multi-GB folders. ### 6.2 The `rust-lib/` proposal — share code, no duplication -Right now, kez-core and kez-channels live inside `rust/crates/`. The -sig-server and the chat-server both want to use them. With everything in -`rust/`, downstream projects have to do: +Right now, `kez-core` and `kez-channels` live inside `rust/crates/`. The +sig-server and the new chat-server both want to use them. Today's +downstream path-dep is: ```toml kez-core = { path = "../rust/crates/kez-core" } ``` -…which works but feels off (why does a separate project reach into -another project's `crates/`?). +…which works but reaches into another project's crate tree. **Recommendation:** move the pure libraries out into a top-level `rust-lib/` workspace. The CLI stays in `rust/`. Downstream servers -depend on `../rust-lib/crates/kez-core`. Clean structure, no duplication, -no confusion about which folder owns the library code. +depend on `../rust-lib/crates/kez-core`. Clean structure, no +duplication, no confusion about which folder owns what. -Refactor effort: small but real. +Refactor steps: - `mv rust/crates/kez-core rust-lib/crates/` - `mv rust/crates/kez-channels rust-lib/crates/` @@ -372,183 +471,173 @@ Refactor effort: small but real. `rust-sig-server/Cargo.toml`. - Update `crosstest.sh` if any paths are hardcoded. -Suggested order: **do the refactor first, then start kez-chat with clean -imports.** Otherwise we'll write `path = "../rust/crates/..."` for the -chat-server and have to fix it later anyway. +**Suggested order:** do the refactor *before* starting kez-chat so we +import cleanly from the start. ### 6.3 Dependencies (planned) | Crate | Why | |---|---| -| `kez-core` (path) | Identity types, sigchain, claim signing | +| `kez-core` (path) | Identity types, ed25519, signing | | `kez-channels` (path) | Verify users' linked accounts when displayed | | `axum` 0.8 | HTTP API | | `tokio` | Async runtime | | `rusqlite` (bundled) | Handle registry | -| `async-nats` | NATS client (for the auth callout and maybe utility) | -| `iroh` | Iroh node (for pinning) | -| `iroh-blobs` | Blob handling | +| `async-nats` | NATS client (admin work + the auth callout glue) | | `serde` / `serde_json` | Standard | | `thiserror` / `anyhow` | Standard | | `tracing` / `tracing-subscriber` | Logging | | `tower-http` | CORS, request tracing | | `clap` | CLI args | -### 6.4 The actual NATS broker +**Not** depended on by the chat-server: +- `iroh` — server doesn't run an Iroh node in v0 (no pinning) +- nats-server (Go) — separate container, not a Rust dep -We don't write a NATS broker. We **run one** alongside the Rust server: +### 6.4 NATS broker — separate container -- Use the official `nats-server` Go binary (downloaded from nats.io or - built from source). -- Configure with JetStream enabled (for offline delivery via durable - consumers). -- Configure auth callout pointing at the kez-chat-server's internal - endpoint, so connection auth defers to the KEZ registry. -- Run in the same Docker compose / systemd target as the Rust server. +We don't write or embed a NATS broker. Run the official Go binary: -NATS clustering for redundancy is a later concern. +- `nats-server` from nats.io +- JetStream enabled (for offline message buffering) +- Auth callout configured to hit `chat-server:8080/internal/nats/auth` +- Run as its own docker-compose service (see §4.3) -### 6.5 The actual Iroh node +Why not embed: NATS is Go; no production-grade Rust port. Docker-compose +keeps the deployment honest (each service in its own container, normal +operational tooling applies). One config change to swap broker +implementations or run a cluster. -We DO embed Iroh in-process — Iroh is a Rust library and works as such. -The chat-server runs an `iroh::Node` and offers it as a pinning service -for users who opt in. +### 6.5 Iroh — client-side only -For client apps: they run their own Iroh node locally too. The -chat-server's Iroh node is just a peer — albeit one that's always online -and willing to hold blobs. +Clients run a local Iroh node for sending and receiving files. The +**chat-server does NOT run an Iroh node** in v0. + +Implication: when @tudisco shares a file with @chris, the bytes go +directly from tudisco's device to chris's device via Iroh. If tudisco +is offline, chris waits. There's no fallback to a server-stored copy. + +This is the simplest possible operational model. Pinning (server-side +fallback storage) is a future addition (§8). --- ## 7. MVP scope -What ships in v0: +### Server (`kez-chat-server`) -- [ ] kez-chat-server binary - - [ ] Handle registry (POST /register, GET /u/:handle) - - [ ] Sigchain mirror (proxy or own copy) - - [ ] NATS auth callout - - [ ] WebFinger endpoint - - [ ] HTTP healthz/metrics -- [ ] NATS broker config + deployment recipe -- [ ] Iroh pinning node embedded (optional per user) -- [ ] Docker compose for the whole bundle (server + nats + iroh node) -- [ ] Integration tests against a real NATS + Iroh +- [ ] HTTP API scaffold (axum + tokio) +- [ ] Handle registry (POST /register, GET /u/:handle) +- [ ] Registration signature validation (uses kez-core) +- [ ] WebFinger endpoint +- [ ] NATS auth callout (POST /internal/nats/auth) +- [ ] Healthz / metrics +- [ ] Integration tests against real nats-server + sig-server in a + test docker-compose -What the **client app** needs to do (separate project? `kez-chat-app/`?): -- [ ] Account creation flow (key gen + handle registration) +### Deployment + +- [ ] docker-compose.yml (chat + nats + sig-server) +- [ ] nats.conf with auth_callout configured +- [ ] systemd alternative deployment recipe +- [ ] README with TLS / reverse proxy guidance + +### Client (`kez-chat-cli` — separate project later) + +Out of scope for the server work, but the **server isn't usable without** +at least a CLI client that does: +- [ ] Account creation (key gen + mnemonic backup + handle registration) - [ ] Contact lookup + verification -- [ ] 1:1 chat (E2E via NATS) -- [ ] File send/receive (E2E via Iroh) -- [ ] Shared-files manifest browse + fetch -- [ ] Profile view (sigchain visualization) +- [ ] Send / receive 1:1 chat messages (E2E via NATS) +- [ ] Send / receive files (E2E via Iroh) +- [ ] Browse @user shared-files manifest -For v0, **CLI client is fine** (`kez-chat send @chris "hello"`). UI app -comes later. +UI app comes after CLI proves the flow works. --- ## 8. Out of scope (v0) -- Group chat -- Forward secrecy (Double Ratchet / MLS) — chat is encrypted but not - ratcheting in v0 -- Voice / video calls -- Multi-device key sync — user has one device with their key for v0 -- Account recovery / lost-key flows — protocol's `rotate` op exists but - UX for recovery isn't designed yet -- Federation across home servers — protocol allows it, but the v0 - app may only resolve handles on its configured default server -- Channel publishing (gist, DNS, ActivityPub, bluesky) — the kez CLI - already has these; not duplicated here. User can run `kez claim ...` - separately to add channel proofs to their sigchain. -- Avatars / display name — could just use `nostr:npub` metadata or a - separate sigchain op; defer the design +- **Iroh pinning** (sender must be online for receiver to fetch) +- **Group chat** (only 1:1 for v0) +- **Forward secrecy / ratcheting** (Double Ratchet, MLS) — chat is + encrypted but each message uses the same X25519-derived key per pair +- **Voice / video calls** +- **Multi-device key sync** — one device per user in v0 +- **Account recovery beyond mnemonic** — paper backup is the only recovery +- **Federation across home servers** — one server (kez.lat) in v0; + design preserves the option +- **Channel-based identity verification** — the CLI already does + `kez verify id ...`; not duplicated in the chat-server. Users add + KEZ channel proofs (gist, dns, etc.) via the existing CLI separately. +- **Avatars / display names** — defer the design. For v0 the UI shows + the handle and that's enough. --- -## 9. Open design questions +## 9. The one remaining open question -These need resolving before serious implementation: +**Manifest format** for "@chris's shared files": -1. **Bundle or separate sigchain server?** - - kez-chat-server includes its own sigchain mirror (one less moving piece for operators) - - …or it depends on a separate kez-sig-server (proper layering) - - Lean: bundle for MVP, factor out later if multiple chat servers want to share. +| Option | How | Tradeoff | +|---|---|---| +| **A. Signed JSON blob, hash in sigchain** | Manifest is a JSON blob stored on Iroh. A new sigchain op `set_shared_files` commits the latest manifest hash. Recipients walk the sigchain → find the pointer → fetch the manifest blob from Iroh. | Simpler. No Iroh Docs dep. Sigchain anchors the version (signed). Update = new sigchain event. | +| **B. Iroh Doc** | Manifest is a mutable CRDT document. Recipients subscribe; updates sync in near-real-time. | Fancier UX (live updates). Requires Iroh Docs subsystem (heavier dep, less stable). | -2. **Iroh pinning by default or opt-in?** - - Default-on: better UX, more storage cost for the server operator - - Opt-in: simpler operator story, worse first-use UX for users whose phones are off - - Lean: opt-in for v0; let users push the pin button per-file. Default-on later. +**Recommended default: A.** Simpler, fewer moving parts, reuses +primitives we already have. We can upgrade to B later if real users +need real-time profile feed updates. -3. **NATS broker: bundled or BYO?** - - kez-chat-server can spawn/manage `nats-server` as a child process - - …or it can assume operator runs NATS separately and just point at it - - Lean: BYO with documented config. We don't reinvent process management. +Settle yes/no on this and the design is locked. -4. **Manifest format** - - Single JSON blob, signed, hash committed via sigchain `set_shared_files` op - - …or Iroh Doc (CRDT-synced) - - Lean: single signed blob for v0; simpler, no Iroh Docs dep. +--- -5. **Handle uniqueness scope** - - Per home server (`tudisco@kez.lat` vs `tudisco@example.com` can be different people) - - Globally enforced somehow (not really possible without a central registry) - - Lean: per home server. Federation handles global resolution. +## Decisions locked from earlier discussion -6. **What about KEZ's existing `nostr:` channel for messaging?** - - It already works for chat-like messages via NIP-44 DMs - - NATS is a separate stack — not interoperable - - Lean: NATS is the chat substrate for this app. Users who want - to send a nostr DM can use a separate nostr client. The KEZ - identity is the same; the transport is the user's choice per - conversation. Document this in the UI. - -7. **Recovery story when you lose your key** - - Spec has `rotate` op — old key signs that new key is now primary - - But if you lost the old key, you can't sign the rotation - - Possible solutions: - - User must keep paper-backup of their key (Bitcoin model) - - User can pre-sign rotation events to multiple device keys - (multi-device redundancy) - - Home server holds an offline emergency-recovery key (centralized - fallback; opt-in) - - Defer detailed design to a later doc. +| Question | Decision | +|---|---| +| Bundle sigchain in chat-server? | **No.** Use existing `kez-sig-server`. Microservices. | +| Bundle NATS into Rust server? | **No.** Run `nats-server` as a separate container; chat-server provides the auth callout. | +| KEZ + nostr coexistence for chat? | **No nostr in chat.** KEZ is identity-only; nostr only as a verifiable claim in someone's sigchain, not as transport. | +| Handle scope: federation or global? | **Global for v0**, federation-ready design (see §3.5). | +| Recovery if key lost? | **Paper backup (24-word mnemonic), Keybase-style.** No server-side recovery. | +| Iroh pinning in v0? | **No.** Sender must be online for receiver to fetch. Pinning is a future tier. | --- ## 10. Risks & honest concerns -1. **NATS auth callout integration depth.** The callout pattern is - documented but the chat-server needs to handle it correctly for - security. nkey signature verification is straightforward but the - integration glue (subject permissions per user, JetStream stream - creation) needs care. +1. **NATS auth callout integration depth.** Documented but fiddly. + nkey signature verification is straightforward; the per-user subject + permission glue needs care. Test cases for "user can publish to + their own inbox only" / "user can subscribe to their own inbox + only" matter. -2. **Iroh is pre-1.0.** API may shift. Pin a version, plan for a future - upgrade pass. The good news: identity stays stable (it's KEZ); only - the transport library needs to be migrated. +2. **Iroh is pre-1.0.** Pin a version. Migration is a chore but only + touches client code, not identity. Identity stays stable (KEZ). -3. **Multi-device.** The MVP assumes one device per user, one key. Real - users have phones + laptops. Multi-device key management is a deep - topic — addressed in a follow-up doc. +3. **Single-device assumption.** Real users have phones AND laptops. + v0 assumes one device per primary. Designing multi-device is a + real follow-up. -4. **Spam in handle registration.** First-come-first-served is easy to - game. Mitigations: - - Proof-of-work on registration? - - Email-based gating (introduces centralization)? - - Rate-limit by IP, accept the leakage - - Defer to v0; revisit if it becomes a problem. +4. **No offline file delivery.** A natural user complaint will be + "Chris sent me a file but he's offline now." We've made the trade + knowingly; document the limitation clearly in-app ("File will + download when @chris is back online"). -5. **NAT traversal for Iroh.** Iroh handles it via relays, but corporate - networks are sometimes hostile. Have a "use server's pinning as - relay" fallback documented. +5. **Handle squatting.** First-come-first-served. Mitigations: + - Rate-limit registration by IP + - Reserve some handles (`@admin`, common project names) + - Accept that some squatting will happen; document the policy -6. **Operational cost.** Running NATS + Iroh + a Rust server isn't free. - - NATS scales horizontally, low resource use - - Iroh nodes can chew through disk if pinning is enabled liberally - - Need a clear "I'm running kez.lat for 1000 users — what does it cost?" - answer before community adoption. +6. **NAT traversal.** Iroh handles it with relays. Test on hostile + networks (corporate firewalls, mobile carriers with CGNAT) before + claiming "just works." + +7. **Operational cost.** Three containers (chat + nats + sig-server) + + bandwidth + a domain. Cheap at small scale, scales with users. + Need a "running kez.lat for 1k users — what does it cost?" answer + before community adoption. --- @@ -556,48 +645,54 @@ These need resolving before serious implementation: When we start building: -1. **Refactor: move `kez-core` + `kez-channels` to `rust-lib/`**. - Tiny but unblocks everything else from having clean imports. +1. **Refactor: move `kez-core` + `kez-channels` to `rust-lib/`.** + Small but unblocks clean imports from kez-chat. -2. **Build `kez-chat-server` scaffold** (axum + sqlite + tracing). - Handle registry + WebFinger first — these are the simplest endpoints - and unblock client-side account creation. +2. **Scaffold `kez-chat-server`** (axum + tokio + sqlite + tracing). + Handle registry + WebFinger first — these unblock client-side + account creation. -3. **Add NATS auth callout.** Spawn `nats-server` separately, configure - it to call our `/internal/nats/auth` endpoint. End-to-end: client - can register a handle and connect to NATS with their nkey. +3. **NATS auth callout.** Run nats-server in a sibling container with + the callout configured to hit our endpoint. End-to-end: a client + can register a handle and then connect to NATS authenticated by + its KEZ key. -4. **Build a minimal `kez-chat` CLI client** that does: +4. **Minimal `kez-chat-cli` client** (separate project) that does: - `kez-chat register tudisco` - `kez-chat add @chris` - `kez-chat send @chris "hello"` - `kez-chat listen` - No UI yet. Enough to prove the chat flow works end-to-end. + No UI. Enough to prove the chat flow works end-to-end against + the server. -5. **Add Iroh integration** to both server and CLI client. - - Server: embedded iroh node for pinning - - Client: local iroh node, blob send/receive - - CLI: `kez-chat share @chris ./file.pdf`, `kez-chat browse @tudisco` +5. **Iroh integration in the client** (not the server). + - Client runs a local Iroh node + - `kez-chat share @chris ./file.pdf` + - `kez-chat fetch ` -6. **Shared-files manifest** (sigchain `set_shared_files` op, manifest - blob format). +6. **Shared-files manifest.** New `set_shared_files` sigchain op. + `kez-chat browse @tudisco` lists his shared files. -7. **Deployment recipe**: docker-compose, systemd unit, deployment doc. +7. **Deployment recipe.** docker-compose, systemd, deployment doc. 8. **Then** start the GUI app. Could be Tauri (Rust + web frontend), - Iced (pure Rust UI), or whatever the user wants. + Iced (pure Rust UI), or something else. --- ## 12. One-paragraph summary -`kez-chat` is a Keybase-class chat and file-sharing app built on the KEZ -identity stack. Users get `@username@kez.lat` handles backed by an -ed25519 primary key. The same key authenticates to a NATS broker (chat, -presence, file tickets — broker is dumb, clients do E2E with -ChaCha20-Poly1305 over ECDH-derived keys) and identifies an Iroh node -(P2P bulk transfer, content-addressed blobs, on-demand fetch). A single -Rust binary (`kez-chat-server`) coordinates the handle registry, NATS -auth, optional sigchain mirror, and optional Iroh pinning. The chat-app -itself is a separate project that consumes the server's HTTP API plus -talks directly to NATS and Iroh. +`kez-chat` is a Keybase-class chat and file-sharing app built on the +KEZ identity stack. Users get `@username@kez.lat` handles backed by an +ed25519 primary key. The same key authenticates to a NATS broker +(chat, presence, file tickets — broker is dumb, clients do E2E with +ChaCha20-Poly1305 over X25519-derived keys) and identifies an Iroh +node (P2P bulk transfer, content-addressed blobs, on-demand fetch). +The server side is a microservices deployment: a thin Rust +`kez-chat-server` handles the handle registry + NATS auth + HTTP API; +a separate `nats-server` container runs the broker; the existing +`kez-sig-server` stores sigchains. The chat-server does not run an +Iroh node and does not pin files in v0 — file transfer is pure P2P +between online peers. Account recovery is via a 24-word paper-backup +mnemonic. Federation across home servers is deferred but the design +keeps it as a flip-the-switch future change.