plan(kez-chat): add design doc for the chat + file share project

Pre-implementation planning document for kez-chat — a Keybase-class chat
and file sharing app built on the KEZ stack.

Architecture (no code yet, just the plan):

- Identity: KEZ ed25519 primary keys; handles look like
  @username@kez.lat (placeholder default home server).
- Messaging: NATS broker, dumb relay, clients do E2E with
  ChaCha20-Poly1305 over X25519-derived keys. nkeys-auth means the
  user's KEZ primary key literally IS their NATS credential.
  JetStream handles offline delivery.
- File transfer: Iroh peer-to-peer, content-addressed blobs.
  On-demand fetch (no folder sync, no surprise downloads).
  Shared-files manifest committed via a new sigchain `set_shared_files`
  op; per-entry encryption for private shares.

Server: a single Rust binary `kez-chat-server` that bundles the
handle registry, NATS auth callout, optional sigchain mirror, and
optional Iroh pinning. NATS broker and Iroh node run alongside it.

Includes:
- End-to-end flows (account creation, add contact, send message,
  share file, browse files)
- Proposed folder restructure: pull kez-core + kez-channels out into
  a top-level `rust-lib/` workspace so downstream projects (sig-server,
  chat-server, future) can path-depend cleanly without reaching into
  each other's crate trees
- MVP scope and explicit out-of-scope list
- 7 open design questions with my recommended defaults
- Sequenced build plan (refactor first → server scaffold → NATS auth
  → CLI client → Iroh → manifest → deploy → GUI)
This commit is contained in:
Tudisco 2026-05-24 22:21:03 -06:00
parent eae98fead0
commit 008875a2ad

603
kez-chat/document.md Normal file
View File

@ -0,0 +1,603 @@
# KEZ Chat & File Share — Design Document
**Status:** Pre-implementation planning. No code yet.
**Last updated:** 2026-05-24
**Goal:** A Keybase-class chat + file sharing experience built on the KEZ
identity stack, with NATS for messaging and Iroh for file transfer.
---
## 1. What this is
A real-time chat + file-sharing application with verified identities.
- Users get human-friendly handles like `@tudisco@kez.lat`.
- The handle is bound to a KEZ primary key (ed25519); the same key
authenticates to the chat infrastructure.
- Conversations are end-to-end encrypted; the broker is dumb.
- Files are visible in the sender's "shared files" list but only
downloaded when a recipient actually wants them. No background sync.
- Identity is portable: the underlying key + sigchain survives the home
server going dark. Handles can be migrated to other servers.
This is the Keybase model rebuilt on a decentralized substrate:
- **Identity layer** → KEZ (instead of Keybase's central account system)
- **Chat layer** → NATS broker with E2E in the client (instead of Keybase
Chat servers)
- **File layer** → Iroh peer-to-peer with content addressing (instead of KBFS)
---
## 2. Three-layer architecture
```
┌─────────────────────────────────────────────────────────────┐
│ kez-chat application │
│ (chat UI, file browser, profile views) │
└────┬──────────────┬─────────────────────┬───────────────────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌──────────┐ ┌────────────────┐
│ KEZ │ │ NATS │ │ Iroh │
│ │ │ │ │ │
│ ↓ who │ │ ↓ chat │ │ ↓ file blobs │
│ ↓ what │ │ ↓ tickets│ │ ↓ on-demand │
│ they │ │ ↓ presence │ ↓ NAT travers. │
│ own │ │ ↓ small │ │ ↓ E2E in QUIC │
│ ↓ where │ │ stuff │ │ │
│ they │ │ │ │ │
│ listen│ │ dumb │ │ │
│ │ │ broker; │ │ │
│ │ │ clients │ │ │
│ │ │ do E2E │ │ │
└─────────┘ └──────────┘ └────────────────┘
│ ▲ ▲
└─────────── sigchain ──────────────────┘
(handle → KEZ primary → endpoints
and links to other identities)
```
Each layer does one thing well. Each is replaceable without touching the
others. The KEZ sigchain is the bridge that ties them together — it tells
a verifier "this user's broker is X, their Iroh nodes are Y₁ and Y₂."
---
## 3. Identity & username model
### 3.1 Handles
Handles look like email and Mastodon addresses:
```
@tudisco@kez.lat
@chris@kez.lat
@alice@chris.com ← custom domain, opted out of default
```
`kez.lat` is the placeholder default home server domain. We'll replace
this with the actual production domain once chosen. The application
treats whatever's after the `@` as the user's home server — multiple
servers can exist, federation is by convention (same model as email).
In the UI, when the home server matches the app's default, handles are
displayed bare (`@tudisco`). Custom domains always display the full form
(`@chris@chris.com`) so users can tell when they're talking to a
non-default-server user.
### 3.2 Key generation tied to username
When a user creates an account:
1. App generates a **fresh ed25519 keypair** locally.
- This is the user's KEZ primary key.
- It's also their NATS nkey for the chat broker (same key, same algorithm).
2. App **registers `@username` on the home server's handle registry**
- POSTs a signed registration request: `{ "handle": "tudisco", "primary": "ed25519:<hex>" }`
- The signature proves the user controls the private key.
- The registry rejects squatting (first-come-first-served per home server).
3. App **initializes a sigchain** for the new primary
- First event: `add_endpoint` advertising the NATS broker the app will use.
- Second event: `add_endpoint` advertising the Iroh NodeId the local app is using.
4. App **uploads the sigchain** to a kez-sig-server (optional but
recommended; otherwise the chain lives only on the user's device).
After this flow the user has a fully working KEZ identity:
- `@tudisco@kez.lat` resolves via the handle registry to their primary key.
- That key's sigchain advertises their NATS broker and Iroh nodes.
- Other users can verify them and reach them.
### 3.3 Why ed25519 (not nostr/secp256k1) for this app
Both KEZ primaries work in general, but the chat app **must** use ed25519
because:
- **NATS nkeys are ed25519.** Direct alignment: the user's KEZ primary key
is their NATS credential. No second auth scheme.
- **Iroh node IDs are ed25519.** Same primitive, native fit.
- **One key type to manage.** Users with a pre-existing nostr key can
still attach it to their KEZ sigchain as a claim (so they're verifiable
on nostr too), but the primary that runs the app is ed25519.
---
## 4. The home server (`kez-chat-server`)
A single Rust binary that bundles the home-server responsibilities. One
process. Self-hostable. Anyone can run their own to be their own home for
their own users.
### 4.1 What it does
| Responsibility | How |
|---|---|
| **Handle registry** | `POST /v1/register` to claim `@username`, `GET /v1/u/<handle>` to look one up. SQLite-backed. Same shape as `kez-id-server` discussed earlier. |
| **Sigchain mirror** (optional) | Mirrors `kez-sig-server` endpoints for users who don't want to publish elsewhere — `POST /v1/sigchains/.../events`, `GET /v1/sigchains/...`. Or proxies through to a separate `kez-sig-server` instance. |
| **NATS broker host** | Runs (or co-runs) a NATS server with JetStream enabled for offline message delivery. Configured to use nkey-based auth tied to KEZ primary keys. |
| **Iroh pinning node** | Runs an Iroh node that users can opt to push their blobs to, so files are served even when the user's own device is offline. (Optional per user.) |
| **WebFinger endpoint** | `/.well-known/webfinger?resource=acct:tudisco@kez.lat` returns user discovery info — interop with fediverse tools. |
| **HTTP API for clients** | Thin REST surface for the chat app to register, look up handles, fetch endpoints, manage settings. |
### 4.2 Process model
For MVP, the server is a **coordinator + adapter**, not a full
reimplementation:
```
┌───────────────────────────────────────────────────────────┐
│ kez-chat-server process (one Rust binary) │
│ - HTTP API (axum) │
│ - Handle registry (SQLite) │
│ - NATS auth callout (validates nkey signatures) │
│ - Sigchain mirror (axum routes — could reuse │
│ rust-sig-server code) │
└──┬──────────────────────┬────────────────────────────────┘
│ launches/manages │ talks to via API
▼ ▼
┌──────────────┐ ┌──────────────┐
│ nats-server │ │ iroh-relay │ (optional, for users
│ (Go binary) │ │ (Rust) │ who want pinning)
│ + JetStream │ │ │
└──────────────┘ └──────────────┘
```
The Rust server doesn't reimplement NATS or Iroh — it sits beside them.
Operator runs the three processes together (Docker compose, systemd
unit, or whatever). The chat-server provides the KEZ-aware integration:
authenticating NATS connections against the handle registry, serving
sigchain endpoints, exposing a clean HTTP API to client apps.
### 4.3 Endpoints (sketch)
```
GET /v1/healthz
GET /v1/u/:handle handle → { primary, sigchain_url, endpoints }
POST /v1/register claim a handle (signed body)
GET /.well-known/webfinger?resource=...
# Sigchain mirror (same as kez-sig-server)
GET /v1/sigchains/:scheme/:id
POST /v1/sigchains/:scheme/:id/events
GET /v1/sigchains/:scheme/:id/head
# NATS auth callout (called by nats-server, not by users)
POST /internal/nats/auth verify nkey signature, return permissions
# Iroh pinning (optional)
POST /v1/pin pin a blob for offline serving
GET /v1/pin/:hash check pinning status
```
The NATS broker and Iroh node are *out-of-process* — clients connect to
them directly (`mqtt://nats.kez.lat:4222`, Iroh direct or via relays).
---
## 5. End-to-end flows
### 5.1 Account creation — `@tudisco@kez.lat`
```
1. User opens kez-chat-app, clicks "Create account"
2. App: generates ed25519 keypair locally
3. App: user picks handle "tudisco"
4. App → kez-chat-server:
POST /v1/register
{ "handle": "tudisco",
"primary": "ed25519:<pubkey-hex>",
"registration_sig": "<sig over canonical message>" }
5. Server: validates signature, checks handle is free, stores in registry
6. Server: 201 Created
7. App: initializes sigchain locally, signs:
{ op: "add_endpoint",
payload: { protocol: "nats",
url: "nats://nats.kez.lat:4222",
inbox: "kez.inbox.<pubkey-hex>" } }
{ op: "add_endpoint",
payload: { protocol: "iroh",
node_id: "<local iroh node id>" } }
8. App → server:
POST /v1/sigchains/ed25519/<pubkey-hex>/events (twice, one per event)
9. App: connects to NATS broker with nkey auth, subscribes to inbox topic
10. Done — user is @tudisco@kez.lat, online, reachable
```
### 5.2 Adding a contact
```
1. Tudisco wants to add Chris. Types "@chris" in app.
2. App → kez-chat-server: GET /v1/u/chris
Returns: { primary: "ed25519:abc...", sigchain_url: "..." }
3. App fetches the sigchain → walks events → extracts:
- nostr/github/dns/etc. claims (for verification)
- NATS broker URL + inbox topic
- Iroh node IDs
4. App displays Chris's profile: verified accounts, avatar (from sigchain
metadata if present), join date
5. App stores LOCAL binding: { "@chris@kez.lat" => ed25519:abc... }
(TOFU — trust on first use)
```
### 5.3 Sending a chat message
```
1. Tudisco types "hello" in the chat with Chris.
2. App: looks up Chris's primary key + NATS endpoint from local store.
3. App: derives a symmetric key via ECDH:
X25519(tudisco_priv, chris_pub) → KDF → 32-byte symmetric key
4. App: encrypts "hello" with ChaCha20-Poly1305 + the derived key.
5. App: signs the ciphertext with tudisco's KEZ primary (so chris can
verify the sender, not just decrypt).
6. App: publishes to NATS subject `kez.inbox.<chris-pubkey-hex>` on
chris's broker, with JetStream delivery (durable, will queue if
chris is offline).
7. Chris's app receives from his subscribed inbox subject.
8. Chris's app: verifies signature against tudisco's key, decrypts, shows
"tudisco: hello".
```
For 1:1 chat, the broker never sees:
- The message contents
- Who tudisco is talking to (the subject is chris's inbox, but anyone could
publish there)
- The relationship between sender and recipient (sender's identity is in
the encrypted+signed payload, not in the NATS metadata)
### 5.4 Sharing a file
```
1. Tudisco drags `report.pdf` into the chat with Chris.
2. App: imports blob into local Iroh node → gets BLAKE3 hash + ticket.
3. App: optionally adds entry to tudisco's shared-files manifest
(visible in his profile if Chris later browses it).
4. App: encrypts the Iroh ticket (and a content key for the blob, if
the file is wrapped with a per-recipient symmetric key) with the
same E2E mechanism as chat messages.
5. App: publishes to chris's NATS inbox: { type: "file_share",
filename: "report.pdf", ticket: "...", content_key: "..." }
6. Chris's app receives the notification, displays:
"tudisco shared report.pdf (1.2 MB)" [Download]
7. Chris clicks Download.
8. App: opens Iroh connection to tudisco's NodeId (from sigchain), pulls
the blob via the ticket, decrypts with the content key, verifies
BLAKE3 hash. File appears.
```
If tudisco is offline at step 8 and he's opted into pinning, Chris's
app fetches from `kez.lat`'s pinning node instead. Same protocol, just
a different source.
### 5.5 Browsing someone's files (Keybase-style)
```
1. Chris opens tudisco's profile.
2. App: resolves @tudisco → primary → sigchain.
3. Sigchain has a `set_shared_files` op with a manifest blob hash.
4. App: fetches the manifest blob (small, fast) via Iroh.
5. App: decrypts entries that are wrapped for chris's key, ignores ones
it can't decrypt (those are wrapped for other people).
6. App: renders the visible entries with name, size, share date,
thumbnail if present.
7. Chris clicks an entry to download — same as 5.4 step 8.
```
The manifest is **small** (KBs); only blobs Chris actually wants are
fetched. No background sync of multi-GB folders.
---
## 6. Project & folder layout
### 6.1 Where this project lives
```
/Kez
├── rust-lib/ ← (proposed) shared Rust libraries
│ ├── Cargo.toml workspace
│ └── crates/
│ ├── kez-core/ moved from rust/crates/
│ └── kez-channels/ moved from rust/crates/
├── rust/ ← Rust CLI (kez binary)
│ └── crates/
│ └── kez-cli/ depends on ../../rust-lib/crates/...
├── rust-sig-server/ ← optional sigchain HTTP store
├── kez-chat/ ← THIS PROJECT
│ ├── document.md (this file)
│ ├── Cargo.toml
│ ├── src/
│ │ ├── main.rs
│ │ ├── handles.rs handle registry
│ │ ├── sigchain.rs sigchain mirror (or proxy)
│ │ ├── nats_auth.rs NATS auth callout
│ │ ├── pin.rs Iroh pinning
│ │ └── api.rs HTTP routes
│ ├── deploy/
│ │ ├── docker-compose.yml chat-server + nats + iroh
│ │ ├── nats.conf
│ │ └── systemd/
│ └── tests/
├── nodejs/ ← (unchanged)
└── crosstest.sh ← (path updates if rust-lib moves)
```
### 6.2 The `rust-lib/` proposal — share code, no duplication
Right now, kez-core and kez-channels live inside `rust/crates/`. The
sig-server and the chat-server both want to use them. With everything in
`rust/`, downstream projects have to do:
```toml
kez-core = { path = "../rust/crates/kez-core" }
```
…which works but feels off (why does a separate project reach into
another project's `crates/`?).
**Recommendation:** move the pure libraries out into a top-level
`rust-lib/` workspace. The CLI stays in `rust/`. Downstream servers
depend on `../rust-lib/crates/kez-core`. Clean structure, no duplication,
no confusion about which folder owns the library code.
Refactor effort: small but real.
- `mv rust/crates/kez-core rust-lib/crates/`
- `mv rust/crates/kez-channels rust-lib/crates/`
- Create `rust-lib/Cargo.toml` (workspace).
- Update `rust/Cargo.toml` to have just kez-cli.
- Update path deps in: `rust/crates/kez-cli/Cargo.toml`,
`rust-sig-server/Cargo.toml`.
- Update `crosstest.sh` if any paths are hardcoded.
Suggested order: **do the refactor first, then start kez-chat with clean
imports.** Otherwise we'll write `path = "../rust/crates/..."` for the
chat-server and have to fix it later anyway.
### 6.3 Dependencies (planned)
| Crate | Why |
|---|---|
| `kez-core` (path) | Identity types, sigchain, claim signing |
| `kez-channels` (path) | Verify users' linked accounts when displayed |
| `axum` 0.8 | HTTP API |
| `tokio` | Async runtime |
| `rusqlite` (bundled) | Handle registry |
| `async-nats` | NATS client (for the auth callout and maybe utility) |
| `iroh` | Iroh node (for pinning) |
| `iroh-blobs` | Blob handling |
| `serde` / `serde_json` | Standard |
| `thiserror` / `anyhow` | Standard |
| `tracing` / `tracing-subscriber` | Logging |
| `tower-http` | CORS, request tracing |
| `clap` | CLI args |
### 6.4 The actual NATS broker
We don't write a NATS broker. We **run one** alongside the Rust server:
- Use the official `nats-server` Go binary (downloaded from nats.io or
built from source).
- Configure with JetStream enabled (for offline delivery via durable
consumers).
- Configure auth callout pointing at the kez-chat-server's internal
endpoint, so connection auth defers to the KEZ registry.
- Run in the same Docker compose / systemd target as the Rust server.
NATS clustering for redundancy is a later concern.
### 6.5 The actual Iroh node
We DO embed Iroh in-process — Iroh is a Rust library and works as such.
The chat-server runs an `iroh::Node` and offers it as a pinning service
for users who opt in.
For client apps: they run their own Iroh node locally too. The
chat-server's Iroh node is just a peer — albeit one that's always online
and willing to hold blobs.
---
## 7. MVP scope
What ships in v0:
- [ ] kez-chat-server binary
- [ ] Handle registry (POST /register, GET /u/:handle)
- [ ] Sigchain mirror (proxy or own copy)
- [ ] NATS auth callout
- [ ] WebFinger endpoint
- [ ] HTTP healthz/metrics
- [ ] NATS broker config + deployment recipe
- [ ] Iroh pinning node embedded (optional per user)
- [ ] Docker compose for the whole bundle (server + nats + iroh node)
- [ ] Integration tests against a real NATS + Iroh
What the **client app** needs to do (separate project? `kez-chat-app/`?):
- [ ] Account creation flow (key gen + handle registration)
- [ ] Contact lookup + verification
- [ ] 1:1 chat (E2E via NATS)
- [ ] File send/receive (E2E via Iroh)
- [ ] Shared-files manifest browse + fetch
- [ ] Profile view (sigchain visualization)
For v0, **CLI client is fine** (`kez-chat send @chris "hello"`). UI app
comes later.
---
## 8. Out of scope (v0)
- Group chat
- Forward secrecy (Double Ratchet / MLS) — chat is encrypted but not
ratcheting in v0
- Voice / video calls
- Multi-device key sync — user has one device with their key for v0
- Account recovery / lost-key flows — protocol's `rotate` op exists but
UX for recovery isn't designed yet
- Federation across home servers — protocol allows it, but the v0
app may only resolve handles on its configured default server
- Channel publishing (gist, DNS, ActivityPub, bluesky) — the kez CLI
already has these; not duplicated here. User can run `kez claim ...`
separately to add channel proofs to their sigchain.
- Avatars / display name — could just use `nostr:npub` metadata or a
separate sigchain op; defer the design
---
## 9. Open design questions
These need resolving before serious implementation:
1. **Bundle or separate sigchain server?**
- kez-chat-server includes its own sigchain mirror (one less moving piece for operators)
- …or it depends on a separate kez-sig-server (proper layering)
- Lean: bundle for MVP, factor out later if multiple chat servers want to share.
2. **Iroh pinning by default or opt-in?**
- Default-on: better UX, more storage cost for the server operator
- Opt-in: simpler operator story, worse first-use UX for users whose phones are off
- Lean: opt-in for v0; let users push the pin button per-file. Default-on later.
3. **NATS broker: bundled or BYO?**
- kez-chat-server can spawn/manage `nats-server` as a child process
- …or it can assume operator runs NATS separately and just point at it
- Lean: BYO with documented config. We don't reinvent process management.
4. **Manifest format**
- Single JSON blob, signed, hash committed via sigchain `set_shared_files` op
- …or Iroh Doc (CRDT-synced)
- Lean: single signed blob for v0; simpler, no Iroh Docs dep.
5. **Handle uniqueness scope**
- Per home server (`tudisco@kez.lat` vs `tudisco@example.com` can be different people)
- Globally enforced somehow (not really possible without a central registry)
- Lean: per home server. Federation handles global resolution.
6. **What about KEZ's existing `nostr:` channel for messaging?**
- It already works for chat-like messages via NIP-44 DMs
- NATS is a separate stack — not interoperable
- Lean: NATS is the chat substrate for this app. Users who want
to send a nostr DM can use a separate nostr client. The KEZ
identity is the same; the transport is the user's choice per
conversation. Document this in the UI.
7. **Recovery story when you lose your key**
- Spec has `rotate` op — old key signs that new key is now primary
- But if you lost the old key, you can't sign the rotation
- Possible solutions:
- User must keep paper-backup of their key (Bitcoin model)
- User can pre-sign rotation events to multiple device keys
(multi-device redundancy)
- Home server holds an offline emergency-recovery key (centralized
fallback; opt-in)
- Defer detailed design to a later doc.
---
## 10. Risks & honest concerns
1. **NATS auth callout integration depth.** The callout pattern is
documented but the chat-server needs to handle it correctly for
security. nkey signature verification is straightforward but the
integration glue (subject permissions per user, JetStream stream
creation) needs care.
2. **Iroh is pre-1.0.** API may shift. Pin a version, plan for a future
upgrade pass. The good news: identity stays stable (it's KEZ); only
the transport library needs to be migrated.
3. **Multi-device.** The MVP assumes one device per user, one key. Real
users have phones + laptops. Multi-device key management is a deep
topic — addressed in a follow-up doc.
4. **Spam in handle registration.** First-come-first-served is easy to
game. Mitigations:
- Proof-of-work on registration?
- Email-based gating (introduces centralization)?
- Rate-limit by IP, accept the leakage
- Defer to v0; revisit if it becomes a problem.
5. **NAT traversal for Iroh.** Iroh handles it via relays, but corporate
networks are sometimes hostile. Have a "use server's pinning as
relay" fallback documented.
6. **Operational cost.** Running NATS + Iroh + a Rust server isn't free.
- NATS scales horizontally, low resource use
- Iroh nodes can chew through disk if pinning is enabled liberally
- Need a clear "I'm running kez.lat for 1000 users — what does it cost?"
answer before community adoption.
---
## 11. The plan, sequenced
When we start building:
1. **Refactor: move `kez-core` + `kez-channels` to `rust-lib/`**.
Tiny but unblocks everything else from having clean imports.
2. **Build `kez-chat-server` scaffold** (axum + sqlite + tracing).
Handle registry + WebFinger first — these are the simplest endpoints
and unblock client-side account creation.
3. **Add NATS auth callout.** Spawn `nats-server` separately, configure
it to call our `/internal/nats/auth` endpoint. End-to-end: client
can register a handle and connect to NATS with their nkey.
4. **Build a minimal `kez-chat` CLI client** that does:
- `kez-chat register tudisco`
- `kez-chat add @chris`
- `kez-chat send @chris "hello"`
- `kez-chat listen`
No UI yet. Enough to prove the chat flow works end-to-end.
5. **Add Iroh integration** to both server and CLI client.
- Server: embedded iroh node for pinning
- Client: local iroh node, blob send/receive
- CLI: `kez-chat share @chris ./file.pdf`, `kez-chat browse @tudisco`
6. **Shared-files manifest** (sigchain `set_shared_files` op, manifest
blob format).
7. **Deployment recipe**: docker-compose, systemd unit, deployment doc.
8. **Then** start the GUI app. Could be Tauri (Rust + web frontend),
Iced (pure Rust UI), or whatever the user wants.
---
## 12. One-paragraph summary
`kez-chat` is a Keybase-class chat and file-sharing app built on the KEZ
identity stack. Users get `@username@kez.lat` handles backed by an
ed25519 primary key. The same key authenticates to a NATS broker (chat,
presence, file tickets — broker is dumb, clients do E2E with
ChaCha20-Poly1305 over ECDH-derived keys) and identifies an Iroh node
(P2P bulk transfer, content-addressed blobs, on-demand fetch). A single
Rust binary (`kez-chat-server`) coordinates the handle registry, NATS
auth, optional sigchain mirror, and optional Iroh pinning. The chat-app
itself is a separate project that consumes the server's HTTP API plus
talks directly to NATS and Iroh.