plan(kez-chat): lock design decisions; rewrite document.md

Sweep through the design doc with all the open questions resolved:

- Microservices: chat-server does NOT bundle sigchain mirror — depends
  on the existing kez-sig-server as a separate container.
- NATS: not embedded in the Rust server. nats-server (Go) runs as its
  own container; chat-server provides an auth callout endpoint that
  nats-server invokes on each client connection.
- No nostr in chat. KEZ is identity-only; nostr only participates as a
  verifiable claim in someone's sigchain, not as transport.
- Global handle namespace for v0, federation-ready design (qualified
  internal handles, HTTP-based lookups, WebFinger from day one).
- Paper-backup recovery (24-word BIP-39-style mnemonic shown at
  account creation, user writes it down, app verifies recall). No
  server-side recovery.
- No Iroh pinning in v0. Files transfer pure P2P; if sender is offline,
  receiver waits. Chat-server doesn't run an Iroh node at all.

Concrete additions to the document:

- §3.4 Paper-backup recovery flow
- §3.5 Federation-ready design notes (qualified handle storage,
  HTTP-based lookups, WebFinger)
- §4.1 Responsibility table now explicitly lists what's NOT in this
  server (sigchain, NATS, Iroh, channel verification)
- §4.3 Sketch of docker-compose.yml showing the three-container
  microservices layout
- §9 collapsed: only one open question remains (manifest format —
  signed blob via sigchain op vs Iroh Doc). Recommended default: A.
- New "Decisions locked" table at the end of §9 summarizing all the
  closed questions
- §5.4 file sharing flow notes "both peers online for v0"
- §6.5 explicitly states "chat-server doesn't run an Iroh node"
- §7 MVP scope trimmed (no Iroh pinning checkbox)
- §11 sequenced plan reflects microservices ordering

Ready to attack once the manifest format decision lands.
This commit is contained in:
Tudisco 2026-05-24 22:37:08 -06:00
parent 008875a2ad
commit f586129787

View File

@ -12,18 +12,17 @@ identity stack, with NATS for messaging and Iroh for file transfer.
A real-time chat + file-sharing application with verified identities. A real-time chat + file-sharing application with verified identities.
- Users get human-friendly handles like `@tudisco@kez.lat`. - Users get human-friendly handles like `@tudisco@kez.lat`.
- The handle is bound to a KEZ primary key (ed25519); the same key - The handle is bound to a KEZ ed25519 primary key; the same key
authenticates to the chat infrastructure. authenticates to the chat infrastructure.
- Conversations are end-to-end encrypted; the broker is dumb. - Conversations are end-to-end encrypted; the broker is dumb.
- Files are visible in the sender's "shared files" list but only - Files are visible in the sender's "shared files" list but only
downloaded when a recipient actually wants them. No background sync. downloaded when a recipient actually wants them. No background sync.
- Identity is portable: the underlying key + sigchain survives the home - Identity is portable: the underlying key + sigchain survives the home
server going dark. Handles can be migrated to other servers. server going dark. Handles can be migrated to other servers later.
This is the Keybase model rebuilt on a decentralized substrate: This is the Keybase model rebuilt on a decentralized substrate:
- **Identity layer** → KEZ (instead of Keybase's central account system) - **Identity layer** → KEZ (instead of Keybase's central account system)
- **Chat layer** → NATS broker with E2E in the client (instead of Keybase - **Chat layer** → NATS with client-side E2E (instead of Keybase Chat)
Chat servers)
- **File layer** → Iroh peer-to-peer with content addressing (instead of KBFS) - **File layer** → Iroh peer-to-peer with content addressing (instead of KBFS)
--- ---
@ -72,18 +71,20 @@ Handles look like email and Mastodon addresses:
``` ```
@tudisco@kez.lat @tudisco@kez.lat
@chris@kez.lat @chris@kez.lat
@alice@chris.com ← custom domain, opted out of default @alice@chris.com ← custom domain, opted out of default (future)
``` ```
`kez.lat` is the placeholder default home server domain. We'll replace `kez.lat` is the placeholder default home server domain. We'll lock in
this with the actual production domain once chosen. The application the real production domain before launch.
treats whatever's after the `@` as the user's home server — multiple
servers can exist, federation is by convention (same model as email).
In the UI, when the home server matches the app's default, handles are For v0, **the handle namespace is global** — registration is on the one
displayed bare (`@tudisco`). Custom domains always display the full form default home server. Federation (multiple servers with their own
(`@chris@chris.com`) so users can tell when they're talking to a namespaces) is deliberately not in v0, but the design must not preclude
non-default-server user. it. See §3.5.
In the UI, since there's only one home server in v0, handles are
displayed bare (`@tudisco`). The `@kez.lat` suffix is implied and stored
internally.
### 3.2 Key generation tied to username ### 3.2 Key generation tied to username
@ -92,104 +93,200 @@ When a user creates an account:
1. App generates a **fresh ed25519 keypair** locally. 1. App generates a **fresh ed25519 keypair** locally.
- This is the user's KEZ primary key. - This is the user's KEZ primary key.
- It's also their NATS nkey for the chat broker (same key, same algorithm). - It's also their NATS nkey for the chat broker (same key, same algorithm).
2. App **registers `@username` on the home server's handle registry** - It's also their Iroh node identity (same primitive again).
- POSTs a signed registration request: `{ "handle": "tudisco", "primary": "ed25519:<hex>" }` 2. App **registers `@username`** on the home server's handle registry.
- The signature proves the user controls the private key. - Sends a signed registration request proving control of the private key.
- The registry rejects squatting (first-come-first-served per home server). - Registry rejects squatting (first-come-first-served).
3. App **initializes a sigchain** for the new primary 3. App **initializes a sigchain** for the new primary.
- First event: `add_endpoint` advertising the NATS broker the app will use. - First event: `add_endpoint` advertising the NATS broker the app will use.
- Second event: `add_endpoint` advertising the Iroh NodeId the local app is using. - Second event: `add_endpoint` advertising the Iroh NodeId of the local device.
4. App **uploads the sigchain** to a kez-sig-server (optional but 4. App **uploads the sigchain** to the deployed `kez-sig-server`.
recommended; otherwise the chain lives only on the user's device).
After this flow the user has a fully working KEZ identity: After this flow the user has a fully working KEZ identity:
- `@tudisco@kez.lat` resolves via the handle registry to their primary key. - `@tudisco@kez.lat` resolves via the handle registry to their primary key.
- That key's sigchain advertises their NATS broker and Iroh nodes. - That key's sigchain (on `kez-sig-server`) advertises their NATS broker and Iroh nodes.
- Other users can verify them and reach them. - Other users can verify them and reach them.
### 3.3 Why ed25519 (not nostr/secp256k1) for this app ### 3.3 Why ed25519 only for this app
Both KEZ primaries work in general, but the chat app **must** use ed25519 Both KEZ primary types work in general, but the chat app **requires** ed25519:
because:
- **NATS nkeys are ed25519.** Direct alignment: the user's KEZ primary key - **NATS nkeys are ed25519.** Direct alignment: the user's KEZ primary key
is their NATS credential. No second auth scheme. is their NATS credential. No second auth scheme.
- **Iroh node IDs are ed25519.** Same primitive, native fit. - **Iroh node IDs are ed25519.** Same primitive, native fit.
- **One key type to manage.** Users with a pre-existing nostr key can - **One key type to manage.** Users with a pre-existing nostr key can
still attach it to their KEZ sigchain as a claim (so they're verifiable still attach it to their KEZ sigchain as a verifiable claim (so they're
on nostr too), but the primary that runs the app is ed25519. cross-referenced on nostr too), but the primary that runs the app is
ed25519. The nostr key never participates in chat or file transport.
### 3.4 Account recovery: paper backup (Keybase-style)
The user's ed25519 private key is the only thing that can prove their
identity. Lose it, lose the account.
Recovery model for v0:
- On account creation, the app converts the 32-byte ed25519 seed to a
**mnemonic phrase** (BIP-39 style, 24 words). Standard, well-tested
word lists, deterministic encoding.
- App **forces the user to write it down** before continuing — shows
the words, asks for confirmation, then asks them to retype a few
random words back to prove they recorded it.
- App stores the seed locally in OS-protected storage (Keychain,
Credential Manager, libsecret). Mnemonic is shown only at creation
and on-demand from settings.
- **Lost device flow:** user installs the app on a new device, types
their mnemonic, app regenerates the same ed25519 keypair, then pulls
the sigchain from `kez-sig-server` to restore their identity state.
- The handle is still theirs because the registry knows the primary key.
No server-side recovery. No email reset. No customer support. Same model
Bitcoin wallets and Keybase used — user holds the seed phrase, user is
responsible for it.
### 3.5 Federation-ready design (not in v0)
For v0 we have **one** home server (`kez.lat`). All handles live there.
To make sure we don't paint ourselves into a corner:
1. **Internal representation of a handle is always the qualified form**
(`tudisco@kez.lat`), never just `tudisco`. The UI strips the suffix
for display; storage always keeps the full form.
2. **Handle resolution is HTTP-based**, not hard-coded. The chat app
looks up `chris@kez.lat` by hitting `https://kez.lat/v1/u/chris`.
When federation lands, looking up `chris@example.com` hits
`https://example.com/v1/u/chris` instead.
3. **WebFinger endpoint included from v0** — so cross-server discovery
already works via standard tooling, even if our app only uses our
own server for now.
4. **Sigchain endpoint URLs are fully qualified.** A user's sigchain
lives at `https://sig.kez.lat/v1/sigchains/ed25519/<hex>` — when
another server's user wants to verify ours, the URL is right there.
The v0 chat app might hard-code "lookups go to `kez.lat`" for now;
flipping that to "lookups go to whatever's after the `@`" is a config
change later, not a redesign.
--- ---
## 4. The home server (`kez-chat-server`) ## 4. The home server (`kez-chat-server`)
A single Rust binary that bundles the home-server responsibilities. One A single Rust binary, deployed as one container alongside other
process. Self-hostable. Anyone can run their own to be their own home for microservices (NATS broker, sigchain server).
their own users.
### 4.1 What it does ### 4.1 What it does (and what it doesn't)
| Responsibility | How | | Responsibility | This server? |
|---|---| |---|---|
| **Handle registry** | `POST /v1/register` to claim `@username`, `GET /v1/u/<handle>` to look one up. SQLite-backed. Same shape as `kez-id-server` discussed earlier. | | **Handle registry** | ✅ Yes |
| **Sigchain mirror** (optional) | Mirrors `kez-sig-server` endpoints for users who don't want to publish elsewhere — `POST /v1/sigchains/.../events`, `GET /v1/sigchains/...`. Or proxies through to a separate `kez-sig-server` instance. | | **NATS auth callout** | ✅ Yes |
| **NATS broker host** | Runs (or co-runs) a NATS server with JetStream enabled for offline message delivery. Configured to use nkey-based auth tied to KEZ primary keys. | | **WebFinger endpoint** | ✅ Yes |
| **Iroh pinning node** | Runs an Iroh node that users can opt to push their blobs to, so files are served even when the user's own device is offline. (Optional per user.) | | **HTTP API for clients** | ✅ Yes |
| **WebFinger endpoint** | `/.well-known/webfinger?resource=acct:tudisco@kez.lat` returns user discovery info — interop with fediverse tools. | | **Sigchain storage** | ❌ No — defer to `kez-sig-server` (separate container) |
| **HTTP API for clients** | Thin REST surface for the chat app to register, look up handles, fetch endpoints, manage settings. | | **NATS broker** | ❌ No — separate `nats-server` (Go) container |
| **Iroh pinning** | ❌ No for v0 — files transfer P2P when both peers are online. Pinning is a future tier. |
| **Channel verification (gist/dns/etc.)** | ❌ No — clients do it locally via `kez-channels`. KEZ system is only used for identity, not as part of chat. |
### 4.2 Process model The chat server is deliberately small. Microservices: each service does
one thing, deployed independently. Operator runs three containers
(chat-server + nats-server + sig-server). When pinning lands later, that
becomes a fourth optional container.
For MVP, the server is a **coordinator + adapter**, not a full ### 4.2 Process / deployment model
reimplementation:
``` ```
┌───────────────────────────────────────────────────────────┐ ┌──────────────────────────────────────────────────────────────┐
│ kez-chat-server process (one Rust binary) │ │ docker-compose / systemd / Kubernetes │
│ - HTTP API (axum) │ │ │
│ - Handle registry (SQLite) │ │ ┌──────────────┐ ┌─────────────────┐ ┌────────────────┐ │
│ - NATS auth callout (validates nkey signatures) │ │ │ nats-server │ │ kez-chat-server │ │ kez-sig-server │ │
│ - Sigchain mirror (axum routes — could reuse │ │ │ (Go) │◄──┤ (Rust) ├──►│ (Rust) │ │
│ rust-sig-server code) │ │ │ + JetStream │ │ │ │ (existing) │ │
└──┬──────────────────────┬────────────────────────────────┘ │ │ │ │ ↓ handles │ │ ↓ sigchain │ │
│ launches/manages │ talks to via API │ │ │ │ ↓ nats auth │ │ storage │ │
▼ ▼ │ │ │ │ ↓ HTTP API │ │ │ │
┌──────────────┐ ┌──────────────┐ │ └──────────────┘ └─────────────────┘ └────────────────┘ │
│ nats-server │ │ iroh-relay │ (optional, for users │ ▲ ▲ ▲ │
│ (Go binary) │ │ (Rust) │ who want pinning) │ │ │ │ │
│ + JetStream │ │ │ └─────────┼───────────────────┼──────────────────────┼─────────┘
└──────────────┘ └──────────────┘ │ │ │
│ │ │
┌──────┴───────────────────┴──────────────────────┴─────┐
│ Chat app (per user, runs on phone/desktop) │
│ │
│ • talks to nats-server over native NATS protocol │
│ • talks to kez-chat-server over HTTPS (handles, etc.) │
│ • talks to kez-sig-server over HTTPS (sigchain) │
│ • runs local iroh::Node for file send/receive │
└────────────────────────────────────────────────────────┘
``` ```
The Rust server doesn't reimplement NATS or Iroh — it sits beside them. The Rust chat-server orchestrates auth between NATS and the handle
Operator runs the three processes together (Docker compose, systemd registry, but doesn't host either NATS or the sigchains.
unit, or whatever). The chat-server provides the KEZ-aware integration:
authenticating NATS connections against the handle registry, serving
sigchain endpoints, exposing a clean HTTP API to client apps.
### 4.3 Endpoints (sketch) ### 4.3 docker-compose sketch
```yaml
# deploy/docker-compose.yml
services:
nats:
image: nats:latest
command: ["-c", "/etc/nats/nats.conf", "--jetstream"]
volumes:
- ./nats.conf:/etc/nats/nats.conf:ro
- nats-data:/data
ports:
- "4222:4222" # client connections (TLS in prod)
- "8222:8222" # monitoring
chat-server:
build: . # kez-chat-server Rust binary
environment:
NATS_URL: nats://nats:4222
SIG_SERVER_URL: http://sig-server:7878
DB_PATH: /data/handles.db
AUTH_CALLOUT_NKEY_PATH: /etc/kez/auth-callout.nkey
volumes:
- chat-data:/data
- ./auth-callout.nkey:/etc/kez/auth-callout.nkey:ro
depends_on: [nats, sig-server]
ports:
- "8080:8080" # HTTP API for clients
sig-server:
image: kez-sig-server:latest # the existing rust-sig-server
environment:
KEZ_DB: /data/sigchains.db
volumes:
- sig-data:/data
ports:
- "7878:7878"
volumes:
nats-data:
chat-data:
sig-data:
```
NATS's auth-callout is configured in `nats.conf` to send connection
requests to `chat-server:8080/internal/nats/auth`. The chat-server
verifies the nkey signature against the handle registry and returns
allowed subjects (typically just the user's own inbox).
### 4.4 Endpoints
``` ```
GET /v1/healthz GET /v1/healthz
GET /v1/u/:handle handle → { primary, sigchain_url, endpoints } GET /v1/u/:handle handle → { primary, sigchain_url, endpoints }
POST /v1/register claim a handle (signed body) POST /v1/register claim a handle (signed body)
GET /.well-known/webfinger?resource=... GET /.well-known/webfinger?resource=acct:tudisco@kez.lat
# Sigchain mirror (same as kez-sig-server) # NATS auth callout (called BY nats-server, not by users)
GET /v1/sigchains/:scheme/:id
POST /v1/sigchains/:scheme/:id/events
GET /v1/sigchains/:scheme/:id/head
# NATS auth callout (called by nats-server, not by users)
POST /internal/nats/auth verify nkey signature, return permissions POST /internal/nats/auth verify nkey signature, return permissions
# Iroh pinning (optional)
POST /v1/pin pin a blob for offline serving
GET /v1/pin/:hash check pinning status
``` ```
The NATS broker and Iroh node are *out-of-process* — clients connect to Sigchain endpoints are **not** on this server — clients talk directly to
them directly (`mqtt://nats.kez.lat:4222`, Iroh direct or via relays). `kez-sig-server` for those.
--- ---
@ -198,111 +295,114 @@ them directly (`mqtt://nats.kez.lat:4222`, Iroh direct or via relays).
### 5.1 Account creation — `@tudisco@kez.lat` ### 5.1 Account creation — `@tudisco@kez.lat`
``` ```
1. User opens kez-chat-app, clicks "Create account" 1. User opens chat app, clicks "Create account"
2. App: generates ed25519 keypair locally 2. App: generates ed25519 keypair locally
3. App: user picks handle "tudisco" 3. App: converts seed to 24-word mnemonic, makes user write it down,
4. App → kez-chat-server: verifies recall before continuing
4. App: user picks handle "tudisco"
5. App → chat-server:
POST /v1/register POST /v1/register
{ "handle": "tudisco", { "handle": "tudisco",
"primary": "ed25519:<pubkey-hex>", "primary": "ed25519:<pubkey-hex>",
"registration_sig": "<sig over canonical message>" } "registration_sig": "<sig over canonical message>" }
5. Server: validates signature, checks handle is free, stores in registry 6. Server: validates signature, checks handle is free, stores in registry
6. Server: 201 Created 7. Server: 201 Created
7. App: initializes sigchain locally, signs: 8. App: initializes sigchain locally, signs:
{ op: "add_endpoint", - add_endpoint { protocol: "nats", url: "...", inbox: "kez.inbox.<pk>" }
payload: { protocol: "nats", - add_endpoint { protocol: "iroh", node_id: "<local iroh id>" }
url: "nats://nats.kez.lat:4222", 9. App → sig-server: POST /v1/sigchains/ed25519/<pk>/events (one per event)
inbox: "kez.inbox.<pubkey-hex>" } } 10. App: connects to nats-server with nkey auth (signed challenge,
{ op: "add_endpoint", nats-server invokes chat-server's auth callout, gets back yes/no
payload: { protocol: "iroh", + allowed subjects)
node_id: "<local iroh node id>" } } 11. App: subscribes to JetStream durable consumer on its inbox subject
8. App → server: 12. Done — @tudisco@kez.lat is live and reachable
POST /v1/sigchains/ed25519/<pubkey-hex>/events (twice, one per event)
9. App: connects to NATS broker with nkey auth, subscribes to inbox topic
10. Done — user is @tudisco@kez.lat, online, reachable
``` ```
### 5.2 Adding a contact ### 5.2 Adding a contact
``` ```
1. Tudisco wants to add Chris. Types "@chris" in app. 1. Tudisco types "@chris" in app
2. App → kez-chat-server: GET /v1/u/chris 2. App → chat-server: GET /v1/u/chris
Returns: { primary: "ed25519:abc...", sigchain_url: "..." } Returns: { primary: "ed25519:abc...", sigchain_url: "https://sig.kez.lat/..." }
3. App fetches the sigchain → walks events → extracts: 3. App → sig-server (URL from above): fetch sigchain
- nostr/github/dns/etc. claims (for verification) 4. App walks events to extract:
- NATS broker URL + inbox topic - NATS broker URL + inbox subject (from add_endpoint nats)
- Iroh node IDs - Iroh node IDs (from add_endpoint iroh)
4. App displays Chris's profile: verified accounts, avatar (from sigchain - Other identity claims (github:chris, dns:chris.com, etc. — for display)
metadata if present), join date 5. App caches LOCALLY: { "@chris@kez.lat" => ed25519:abc..., endpoints: {...} }
5. App stores LOCAL binding: { "@chris@kez.lat" => ed25519:abc... }
(TOFU — trust on first use) (TOFU — trust on first use)
``` ```
### 5.3 Sending a chat message ### 5.3 Sending a chat message
``` ```
1. Tudisco types "hello" in the chat with Chris. 1. Tudisco types "hello" to Chris
2. App: looks up Chris's primary key + NATS endpoint from local store. 2. App looks up Chris's primary key + NATS endpoint from local cache
3. App: derives a symmetric key via ECDH: 3. App derives a per-message symmetric key:
X25519(tudisco_priv, chris_pub) → KDF → 32-byte symmetric key X25519(tudisco_priv, chris_pub) → HKDF → 32-byte ChaCha20-Poly1305 key
4. App: encrypts "hello" with ChaCha20-Poly1305 + the derived key. 4. App encrypts "hello" with that key (+ random nonce)
5. App: signs the ciphertext with tudisco's KEZ primary (so chris can 5. App signs ciphertext with tudisco's KEZ primary
verify the sender, not just decrypt). 6. App publishes to subject `kez.inbox.<chris-pubkey-hex>` on the NATS
6. App: publishes to NATS subject `kez.inbox.<chris-pubkey-hex>` on broker, JetStream-published so the broker stores it durably
chris's broker, with JetStream delivery (durable, will queue if 7. Chris's app (subscribed via durable consumer) receives the message
chris is offline). whenever next online — broker buffers it if offline
7. Chris's app receives from his subscribed inbox subject. 8. Chris's app verifies signature against tudisco's key, decrypts,
8. Chris's app: verifies signature against tudisco's key, decrypts, shows shows "tudisco: hello"
"tudisco: hello".
``` ```
For 1:1 chat, the broker never sees: The broker sees:
- The message contents - An nkey-authenticated client publishing encrypted bytes to a subject
- Who tudisco is talking to (the subject is chris's inbox, but anyone could - It does NOT see: who's reading the subject, message contents, sender
publish there) identity (sender identity is in the signed payload, not the NATS frame)
- The relationship between sender and recipient (sender's identity is in
the encrypted+signed payload, not in the NATS metadata)
### 5.4 Sharing a file ### 5.4 Sharing a file (v0: both peers online)
``` ```
1. Tudisco drags `report.pdf` into the chat with Chris. 1. Tudisco drags `report.pdf` into the chat with Chris
2. App: imports blob into local Iroh node → gets BLAKE3 hash + ticket. 2. App imports the blob into local Iroh node → BLAKE3 hash + ticket
3. App: optionally adds entry to tudisco's shared-files manifest 3. App optionally adds an entry to tudisco's "shared files" manifest
(visible in his profile if Chris later browses it). (visible if Chris later browses tudisco's profile)
4. App: encrypts the Iroh ticket (and a content key for the blob, if 4. App generates a per-file symmetric content key
the file is wrapped with a per-recipient symmetric key) with the 5. App encrypts the blob in place (or stores both plaintext + encrypted —
same E2E mechanism as chat messages. detail for later) with the content key
5. App: publishes to chris's NATS inbox: { type: "file_share", 6. App wraps the content key for chris's KEZ key (X25519 → HKDF)
filename: "report.pdf", ticket: "...", content_key: "..." } 7. App sends a NATS message to chris's inbox:
6. Chris's app receives the notification, displays: { type: "file_share",
"tudisco shared report.pdf (1.2 MB)" [Download] filename: "report.pdf",
7. Chris clicks Download. size: 1234567,
8. App: opens Iroh connection to tudisco's NodeId (from sigchain), pulls iroh_ticket: "blobac://...",
the blob via the ticket, decrypts with the content key, verifies wrapped_content_key: "..." }
BLAKE3 hash. File appears. (same encryption as chat messages, so chris can read this)
8. Chris's app sees the notification: "tudisco shared report.pdf (1.2 MB)"
File NOT downloaded yet.
9. Chris clicks Download.
10. Chris's app opens an Iroh connection to tudisco's NodeId (from
tudisco's sigchain), pulls the blob via the ticket, decrypts with
the unwrapped content key, verifies BLAKE3 hash. File appears.
``` ```
If tudisco is offline at step 8 and he's opted into pinning, Chris's **v0 limitation:** If tudisco is offline at step 10, chris waits.
app fetches from `kez.lat`'s pinning node instead. Same protocol, just Iroh will retry; download starts when tudisco's node comes back.
a different source. Pinning (the server holding a copy) is **not** in v0 — we accept this
limitation in exchange for zero server-side storage cost and the
simplest possible architecture.
### 5.5 Browsing someone's files (Keybase-style) ### 5.5 Browsing someone's files (Keybase-style)
``` ```
1. Chris opens tudisco's profile. 1. Chris opens tudisco's profile
2. App: resolves @tudisco → primary → sigchain. 2. App resolves @tudisco → primary → sigchain
3. Sigchain has a `set_shared_files` op with a manifest blob hash. 3. Sigchain has a `set_shared_files` op pointing at a manifest blob hash
4. App: fetches the manifest blob (small, fast) via Iroh. 4. App fetches the manifest blob via Iroh (small, fast)
5. App: decrypts entries that are wrapped for chris's key, ignores ones 5. App decrypts entries wrapped for chris's key, ignores ones it can't
it can't decrypt (those are wrapped for other people). decrypt (those are wrapped for other people)
6. App: renders the visible entries with name, size, share date, 6. App renders the visible entries: name, size, share date,
thumbnail if present. thumbnail (if present)
7. Chris clicks an entry to download — same as 5.4 step 8. 7. Chris clicks an entry → flow continues like §5.4 step 9
``` ```
The manifest is **small** (KBs); only blobs Chris actually wants are Manifest is small (KB-scale); blobs are MB-to-GB. Browsing is cheap;
fetched. No background sync of multi-GB folders. fetching is per-file deliberate. **Recipient never auto-syncs.**
--- ---
@ -312,7 +412,7 @@ fetched. No background sync of multi-GB folders.
``` ```
/Kez /Kez
├── rust-lib/ ← (proposed) shared Rust libraries ├── rust-lib/ ← (proposed refactor) shared Rust libraries
│ ├── Cargo.toml workspace │ ├── Cargo.toml workspace
│ └── crates/ │ └── crates/
│ ├── kez-core/ moved from rust/crates/ │ ├── kez-core/ moved from rust/crates/
@ -322,23 +422,23 @@ fetched. No background sync of multi-GB folders.
│ └── crates/ │ └── crates/
│ └── kez-cli/ depends on ../../rust-lib/crates/... │ └── kez-cli/ depends on ../../rust-lib/crates/...
├── rust-sig-server/ ← optional sigchain HTTP store ├── rust-sig-server/ ← existing sigchain storage (reused as-is)
├── kez-chat/ ← THIS PROJECT ├── kez-chat/ ← THIS PROJECT
│ ├── document.md (this file) │ ├── document.md (this file)
│ ├── Cargo.toml │ ├── Cargo.toml
│ ├── src/ │ ├── src/
│ │ ├── main.rs │ │ ├── main.rs binary entry
│ │ ├── handles.rs handle registry │ │ ├── handles.rs handle registry (sqlite-backed)
│ │ ├── sigchain.rs sigchain mirror (or proxy) │ │ ├── nats_auth.rs NATS auth callout endpoint
│ │ ├── nats_auth.rs NATS auth callout │ │ ├── webfinger.rs WebFinger discovery endpoint
│ │ ├── pin.rs Iroh pinning │ │ └── api.rs axum routes + state
│ │ └── api.rs HTTP routes
│ ├── deploy/ │ ├── deploy/
│ │ ├── docker-compose.yml chat-server + nats + iroh │ │ ├── docker-compose.yml chat-server + nats + sig-server
│ │ ├── nats.conf │ │ ├── nats.conf with auth_callout config
│ │ └── systemd/ │ │ └── systemd/ alternative deployment
│ └── tests/ │ └── tests/
│ └── http.rs integration tests
├── nodejs/ ← (unchanged) ├── nodejs/ ← (unchanged)
└── crosstest.sh ← (path updates if rust-lib moves) └── crosstest.sh ← (path updates if rust-lib moves)
@ -346,23 +446,22 @@ fetched. No background sync of multi-GB folders.
### 6.2 The `rust-lib/` proposal — share code, no duplication ### 6.2 The `rust-lib/` proposal — share code, no duplication
Right now, kez-core and kez-channels live inside `rust/crates/`. The Right now, `kez-core` and `kez-channels` live inside `rust/crates/`. The
sig-server and the chat-server both want to use them. With everything in sig-server and the new chat-server both want to use them. Today's
`rust/`, downstream projects have to do: downstream path-dep is:
```toml ```toml
kez-core = { path = "../rust/crates/kez-core" } kez-core = { path = "../rust/crates/kez-core" }
``` ```
…which works but feels off (why does a separate project reach into …which works but reaches into another project's crate tree.
another project's `crates/`?).
**Recommendation:** move the pure libraries out into a top-level **Recommendation:** move the pure libraries out into a top-level
`rust-lib/` workspace. The CLI stays in `rust/`. Downstream servers `rust-lib/` workspace. The CLI stays in `rust/`. Downstream servers
depend on `../rust-lib/crates/kez-core`. Clean structure, no duplication, depend on `../rust-lib/crates/kez-core`. Clean structure, no
no confusion about which folder owns the library code. duplication, no confusion about which folder owns what.
Refactor effort: small but real. Refactor steps:
- `mv rust/crates/kez-core rust-lib/crates/` - `mv rust/crates/kez-core rust-lib/crates/`
- `mv rust/crates/kez-channels rust-lib/crates/` - `mv rust/crates/kez-channels rust-lib/crates/`
@ -372,183 +471,173 @@ Refactor effort: small but real.
`rust-sig-server/Cargo.toml`. `rust-sig-server/Cargo.toml`.
- Update `crosstest.sh` if any paths are hardcoded. - Update `crosstest.sh` if any paths are hardcoded.
Suggested order: **do the refactor first, then start kez-chat with clean **Suggested order:** do the refactor *before* starting kez-chat so we
imports.** Otherwise we'll write `path = "../rust/crates/..."` for the import cleanly from the start.
chat-server and have to fix it later anyway.
### 6.3 Dependencies (planned) ### 6.3 Dependencies (planned)
| Crate | Why | | Crate | Why |
|---|---| |---|---|
| `kez-core` (path) | Identity types, sigchain, claim signing | | `kez-core` (path) | Identity types, ed25519, signing |
| `kez-channels` (path) | Verify users' linked accounts when displayed | | `kez-channels` (path) | Verify users' linked accounts when displayed |
| `axum` 0.8 | HTTP API | | `axum` 0.8 | HTTP API |
| `tokio` | Async runtime | | `tokio` | Async runtime |
| `rusqlite` (bundled) | Handle registry | | `rusqlite` (bundled) | Handle registry |
| `async-nats` | NATS client (for the auth callout and maybe utility) | | `async-nats` | NATS client (admin work + the auth callout glue) |
| `iroh` | Iroh node (for pinning) |
| `iroh-blobs` | Blob handling |
| `serde` / `serde_json` | Standard | | `serde` / `serde_json` | Standard |
| `thiserror` / `anyhow` | Standard | | `thiserror` / `anyhow` | Standard |
| `tracing` / `tracing-subscriber` | Logging | | `tracing` / `tracing-subscriber` | Logging |
| `tower-http` | CORS, request tracing | | `tower-http` | CORS, request tracing |
| `clap` | CLI args | | `clap` | CLI args |
### 6.4 The actual NATS broker **Not** depended on by the chat-server:
- `iroh` — server doesn't run an Iroh node in v0 (no pinning)
- nats-server (Go) — separate container, not a Rust dep
We don't write a NATS broker. We **run one** alongside the Rust server: ### 6.4 NATS broker — separate container
- Use the official `nats-server` Go binary (downloaded from nats.io or We don't write or embed a NATS broker. Run the official Go binary:
built from source).
- Configure with JetStream enabled (for offline delivery via durable
consumers).
- Configure auth callout pointing at the kez-chat-server's internal
endpoint, so connection auth defers to the KEZ registry.
- Run in the same Docker compose / systemd target as the Rust server.
NATS clustering for redundancy is a later concern. - `nats-server` from nats.io
- JetStream enabled (for offline message buffering)
- Auth callout configured to hit `chat-server:8080/internal/nats/auth`
- Run as its own docker-compose service (see §4.3)
### 6.5 The actual Iroh node Why not embed: NATS is Go; no production-grade Rust port. Docker-compose
keeps the deployment honest (each service in its own container, normal
operational tooling applies). One config change to swap broker
implementations or run a cluster.
We DO embed Iroh in-process — Iroh is a Rust library and works as such. ### 6.5 Iroh — client-side only
The chat-server runs an `iroh::Node` and offers it as a pinning service
for users who opt in.
For client apps: they run their own Iroh node locally too. The Clients run a local Iroh node for sending and receiving files. The
chat-server's Iroh node is just a peer — albeit one that's always online **chat-server does NOT run an Iroh node** in v0.
and willing to hold blobs.
Implication: when @tudisco shares a file with @chris, the bytes go
directly from tudisco's device to chris's device via Iroh. If tudisco
is offline, chris waits. There's no fallback to a server-stored copy.
This is the simplest possible operational model. Pinning (server-side
fallback storage) is a future addition (§8).
--- ---
## 7. MVP scope ## 7. MVP scope
What ships in v0: ### Server (`kez-chat-server`)
- [ ] kez-chat-server binary - [ ] HTTP API scaffold (axum + tokio)
- [ ] Handle registry (POST /register, GET /u/:handle) - [ ] Handle registry (POST /register, GET /u/:handle)
- [ ] Sigchain mirror (proxy or own copy) - [ ] Registration signature validation (uses kez-core)
- [ ] NATS auth callout - [ ] WebFinger endpoint
- [ ] WebFinger endpoint - [ ] NATS auth callout (POST /internal/nats/auth)
- [ ] HTTP healthz/metrics - [ ] Healthz / metrics
- [ ] NATS broker config + deployment recipe - [ ] Integration tests against real nats-server + sig-server in a
- [ ] Iroh pinning node embedded (optional per user) test docker-compose
- [ ] Docker compose for the whole bundle (server + nats + iroh node)
- [ ] Integration tests against a real NATS + Iroh
What the **client app** needs to do (separate project? `kez-chat-app/`?): ### Deployment
- [ ] Account creation flow (key gen + handle registration)
- [ ] docker-compose.yml (chat + nats + sig-server)
- [ ] nats.conf with auth_callout configured
- [ ] systemd alternative deployment recipe
- [ ] README with TLS / reverse proxy guidance
### Client (`kez-chat-cli` — separate project later)
Out of scope for the server work, but the **server isn't usable without**
at least a CLI client that does:
- [ ] Account creation (key gen + mnemonic backup + handle registration)
- [ ] Contact lookup + verification - [ ] Contact lookup + verification
- [ ] 1:1 chat (E2E via NATS) - [ ] Send / receive 1:1 chat messages (E2E via NATS)
- [ ] File send/receive (E2E via Iroh) - [ ] Send / receive files (E2E via Iroh)
- [ ] Shared-files manifest browse + fetch - [ ] Browse @user shared-files manifest
- [ ] Profile view (sigchain visualization)
For v0, **CLI client is fine** (`kez-chat send @chris "hello"`). UI app UI app comes after CLI proves the flow works.
comes later.
--- ---
## 8. Out of scope (v0) ## 8. Out of scope (v0)
- Group chat - **Iroh pinning** (sender must be online for receiver to fetch)
- Forward secrecy (Double Ratchet / MLS) — chat is encrypted but not - **Group chat** (only 1:1 for v0)
ratcheting in v0 - **Forward secrecy / ratcheting** (Double Ratchet, MLS) — chat is
- Voice / video calls encrypted but each message uses the same X25519-derived key per pair
- Multi-device key sync — user has one device with their key for v0 - **Voice / video calls**
- Account recovery / lost-key flows — protocol's `rotate` op exists but - **Multi-device key sync** — one device per user in v0
UX for recovery isn't designed yet - **Account recovery beyond mnemonic** — paper backup is the only recovery
- Federation across home servers — protocol allows it, but the v0 - **Federation across home servers** — one server (kez.lat) in v0;
app may only resolve handles on its configured default server design preserves the option
- Channel publishing (gist, DNS, ActivityPub, bluesky) — the kez CLI - **Channel-based identity verification** — the CLI already does
already has these; not duplicated here. User can run `kez claim ...` `kez verify id ...`; not duplicated in the chat-server. Users add
separately to add channel proofs to their sigchain. KEZ channel proofs (gist, dns, etc.) via the existing CLI separately.
- Avatars / display name — could just use `nostr:npub` metadata or a - **Avatars / display names** — defer the design. For v0 the UI shows
separate sigchain op; defer the design the handle and that's enough.
--- ---
## 9. Open design questions ## 9. The one remaining open question
These need resolving before serious implementation: **Manifest format** for "@chris's shared files":
1. **Bundle or separate sigchain server?** | Option | How | Tradeoff |
- kez-chat-server includes its own sigchain mirror (one less moving piece for operators) |---|---|---|
- …or it depends on a separate kez-sig-server (proper layering) | **A. Signed JSON blob, hash in sigchain** | Manifest is a JSON blob stored on Iroh. A new sigchain op `set_shared_files` commits the latest manifest hash. Recipients walk the sigchain → find the pointer → fetch the manifest blob from Iroh. | Simpler. No Iroh Docs dep. Sigchain anchors the version (signed). Update = new sigchain event. |
- Lean: bundle for MVP, factor out later if multiple chat servers want to share. | **B. Iroh Doc** | Manifest is a mutable CRDT document. Recipients subscribe; updates sync in near-real-time. | Fancier UX (live updates). Requires Iroh Docs subsystem (heavier dep, less stable). |
2. **Iroh pinning by default or opt-in?** **Recommended default: A.** Simpler, fewer moving parts, reuses
- Default-on: better UX, more storage cost for the server operator primitives we already have. We can upgrade to B later if real users
- Opt-in: simpler operator story, worse first-use UX for users whose phones are off need real-time profile feed updates.
- Lean: opt-in for v0; let users push the pin button per-file. Default-on later.
3. **NATS broker: bundled or BYO?** Settle yes/no on this and the design is locked.
- kez-chat-server can spawn/manage `nats-server` as a child process
- …or it can assume operator runs NATS separately and just point at it
- Lean: BYO with documented config. We don't reinvent process management.
4. **Manifest format** ---
- Single JSON blob, signed, hash committed via sigchain `set_shared_files` op
- …or Iroh Doc (CRDT-synced)
- Lean: single signed blob for v0; simpler, no Iroh Docs dep.
5. **Handle uniqueness scope** ## Decisions locked from earlier discussion
- Per home server (`tudisco@kez.lat` vs `tudisco@example.com` can be different people)
- Globally enforced somehow (not really possible without a central registry)
- Lean: per home server. Federation handles global resolution.
6. **What about KEZ's existing `nostr:` channel for messaging?** | Question | Decision |
- It already works for chat-like messages via NIP-44 DMs |---|---|
- NATS is a separate stack — not interoperable | Bundle sigchain in chat-server? | **No.** Use existing `kez-sig-server`. Microservices. |
- Lean: NATS is the chat substrate for this app. Users who want | Bundle NATS into Rust server? | **No.** Run `nats-server` as a separate container; chat-server provides the auth callout. |
to send a nostr DM can use a separate nostr client. The KEZ | KEZ + nostr coexistence for chat? | **No nostr in chat.** KEZ is identity-only; nostr only as a verifiable claim in someone's sigchain, not as transport. |
identity is the same; the transport is the user's choice per | Handle scope: federation or global? | **Global for v0**, federation-ready design (see §3.5). |
conversation. Document this in the UI. | Recovery if key lost? | **Paper backup (24-word mnemonic), Keybase-style.** No server-side recovery. |
| Iroh pinning in v0? | **No.** Sender must be online for receiver to fetch. Pinning is a future tier. |
7. **Recovery story when you lose your key**
- Spec has `rotate` op — old key signs that new key is now primary
- But if you lost the old key, you can't sign the rotation
- Possible solutions:
- User must keep paper-backup of their key (Bitcoin model)
- User can pre-sign rotation events to multiple device keys
(multi-device redundancy)
- Home server holds an offline emergency-recovery key (centralized
fallback; opt-in)
- Defer detailed design to a later doc.
--- ---
## 10. Risks & honest concerns ## 10. Risks & honest concerns
1. **NATS auth callout integration depth.** The callout pattern is 1. **NATS auth callout integration depth.** Documented but fiddly.
documented but the chat-server needs to handle it correctly for nkey signature verification is straightforward; the per-user subject
security. nkey signature verification is straightforward but the permission glue needs care. Test cases for "user can publish to
integration glue (subject permissions per user, JetStream stream their own inbox only" / "user can subscribe to their own inbox
creation) needs care. only" matter.
2. **Iroh is pre-1.0.** API may shift. Pin a version, plan for a future 2. **Iroh is pre-1.0.** Pin a version. Migration is a chore but only
upgrade pass. The good news: identity stays stable (it's KEZ); only touches client code, not identity. Identity stays stable (KEZ).
the transport library needs to be migrated.
3. **Multi-device.** The MVP assumes one device per user, one key. Real 3. **Single-device assumption.** Real users have phones AND laptops.
users have phones + laptops. Multi-device key management is a deep v0 assumes one device per primary. Designing multi-device is a
topic — addressed in a follow-up doc. real follow-up.
4. **Spam in handle registration.** First-come-first-served is easy to 4. **No offline file delivery.** A natural user complaint will be
game. Mitigations: "Chris sent me a file but he's offline now." We've made the trade
- Proof-of-work on registration? knowingly; document the limitation clearly in-app ("File will
- Email-based gating (introduces centralization)? download when @chris is back online").
- Rate-limit by IP, accept the leakage
- Defer to v0; revisit if it becomes a problem.
5. **NAT traversal for Iroh.** Iroh handles it via relays, but corporate 5. **Handle squatting.** First-come-first-served. Mitigations:
networks are sometimes hostile. Have a "use server's pinning as - Rate-limit registration by IP
relay" fallback documented. - Reserve some handles (`@admin`, common project names)
- Accept that some squatting will happen; document the policy
6. **Operational cost.** Running NATS + Iroh + a Rust server isn't free. 6. **NAT traversal.** Iroh handles it with relays. Test on hostile
- NATS scales horizontally, low resource use networks (corporate firewalls, mobile carriers with CGNAT) before
- Iroh nodes can chew through disk if pinning is enabled liberally claiming "just works."
- Need a clear "I'm running kez.lat for 1000 users — what does it cost?"
answer before community adoption. 7. **Operational cost.** Three containers (chat + nats + sig-server)
+ bandwidth + a domain. Cheap at small scale, scales with users.
Need a "running kez.lat for 1k users — what does it cost?" answer
before community adoption.
--- ---
@ -556,48 +645,54 @@ These need resolving before serious implementation:
When we start building: When we start building:
1. **Refactor: move `kez-core` + `kez-channels` to `rust-lib/`**. 1. **Refactor: move `kez-core` + `kez-channels` to `rust-lib/`.**
Tiny but unblocks everything else from having clean imports. Small but unblocks clean imports from kez-chat.
2. **Build `kez-chat-server` scaffold** (axum + sqlite + tracing). 2. **Scaffold `kez-chat-server`** (axum + tokio + sqlite + tracing).
Handle registry + WebFinger first — these are the simplest endpoints Handle registry + WebFinger first — these unblock client-side
and unblock client-side account creation. account creation.
3. **Add NATS auth callout.** Spawn `nats-server` separately, configure 3. **NATS auth callout.** Run nats-server in a sibling container with
it to call our `/internal/nats/auth` endpoint. End-to-end: client the callout configured to hit our endpoint. End-to-end: a client
can register a handle and connect to NATS with their nkey. can register a handle and then connect to NATS authenticated by
its KEZ key.
4. **Build a minimal `kez-chat` CLI client** that does: 4. **Minimal `kez-chat-cli` client** (separate project) that does:
- `kez-chat register tudisco` - `kez-chat register tudisco`
- `kez-chat add @chris` - `kez-chat add @chris`
- `kez-chat send @chris "hello"` - `kez-chat send @chris "hello"`
- `kez-chat listen` - `kez-chat listen`
No UI yet. Enough to prove the chat flow works end-to-end. No UI. Enough to prove the chat flow works end-to-end against
the server.
5. **Add Iroh integration** to both server and CLI client. 5. **Iroh integration in the client** (not the server).
- Server: embedded iroh node for pinning - Client runs a local Iroh node
- Client: local iroh node, blob send/receive - `kez-chat share @chris ./file.pdf`
- CLI: `kez-chat share @chris ./file.pdf`, `kez-chat browse @tudisco` - `kez-chat fetch <ticket>`
6. **Shared-files manifest** (sigchain `set_shared_files` op, manifest 6. **Shared-files manifest.** New `set_shared_files` sigchain op.
blob format). `kez-chat browse @tudisco` lists his shared files.
7. **Deployment recipe**: docker-compose, systemd unit, deployment doc. 7. **Deployment recipe.** docker-compose, systemd, deployment doc.
8. **Then** start the GUI app. Could be Tauri (Rust + web frontend), 8. **Then** start the GUI app. Could be Tauri (Rust + web frontend),
Iced (pure Rust UI), or whatever the user wants. Iced (pure Rust UI), or something else.
--- ---
## 12. One-paragraph summary ## 12. One-paragraph summary
`kez-chat` is a Keybase-class chat and file-sharing app built on the KEZ `kez-chat` is a Keybase-class chat and file-sharing app built on the
identity stack. Users get `@username@kez.lat` handles backed by an KEZ identity stack. Users get `@username@kez.lat` handles backed by an
ed25519 primary key. The same key authenticates to a NATS broker (chat, ed25519 primary key. The same key authenticates to a NATS broker
presence, file tickets — broker is dumb, clients do E2E with (chat, presence, file tickets — broker is dumb, clients do E2E with
ChaCha20-Poly1305 over ECDH-derived keys) and identifies an Iroh node ChaCha20-Poly1305 over X25519-derived keys) and identifies an Iroh
(P2P bulk transfer, content-addressed blobs, on-demand fetch). A single node (P2P bulk transfer, content-addressed blobs, on-demand fetch).
Rust binary (`kez-chat-server`) coordinates the handle registry, NATS The server side is a microservices deployment: a thin Rust
auth, optional sigchain mirror, and optional Iroh pinning. The chat-app `kez-chat-server` handles the handle registry + NATS auth + HTTP API;
itself is a separate project that consumes the server's HTTP API plus a separate `nats-server` container runs the broker; the existing
talks directly to NATS and Iroh. `kez-sig-server` stores sigchains. The chat-server does not run an
Iroh node and does not pin files in v0 — file transfer is pure P2P
between online peers. Account recovery is via a 24-word paper-backup
mnemonic. Federation across home servers is deferred but the design
keeps it as a flip-the-switch future change.