Kez/kez-chat/document.md
Tudisco 008875a2ad plan(kez-chat): add design doc for the chat + file share project
Pre-implementation planning document for kez-chat — a Keybase-class chat
and file sharing app built on the KEZ stack.

Architecture (no code yet, just the plan):

- Identity: KEZ ed25519 primary keys; handles look like
  @username@kez.lat (placeholder default home server).
- Messaging: NATS broker, dumb relay, clients do E2E with
  ChaCha20-Poly1305 over X25519-derived keys. nkeys-auth means the
  user's KEZ primary key literally IS their NATS credential.
  JetStream handles offline delivery.
- File transfer: Iroh peer-to-peer, content-addressed blobs.
  On-demand fetch (no folder sync, no surprise downloads).
  Shared-files manifest committed via a new sigchain `set_shared_files`
  op; per-entry encryption for private shares.

Server: a single Rust binary `kez-chat-server` that bundles the
handle registry, NATS auth callout, optional sigchain mirror, and
optional Iroh pinning. NATS broker and Iroh node run alongside it.

Includes:
- End-to-end flows (account creation, add contact, send message,
  share file, browse files)
- Proposed folder restructure: pull kez-core + kez-channels out into
  a top-level `rust-lib/` workspace so downstream projects (sig-server,
  chat-server, future) can path-depend cleanly without reaching into
  each other's crate trees
- MVP scope and explicit out-of-scope list
- 7 open design questions with my recommended defaults
- Sequenced build plan (refactor first → server scaffold → NATS auth
  → CLI client → Iroh → manifest → deploy → GUI)
2026-05-24 22:21:03 -06:00

25 KiB

KEZ Chat & File Share — Design Document

Status: Pre-implementation planning. No code yet. Last updated: 2026-05-24 Goal: A Keybase-class chat + file sharing experience built on the KEZ identity stack, with NATS for messaging and Iroh for file transfer.


1. What this is

A real-time chat + file-sharing application with verified identities.

  • Users get human-friendly handles like @tudisco@kez.lat.
  • The handle is bound to a KEZ primary key (ed25519); the same key authenticates to the chat infrastructure.
  • Conversations are end-to-end encrypted; the broker is dumb.
  • Files are visible in the sender's "shared files" list but only downloaded when a recipient actually wants them. No background sync.
  • Identity is portable: the underlying key + sigchain survives the home server going dark. Handles can be migrated to other servers.

This is the Keybase model rebuilt on a decentralized substrate:

  • Identity layer → KEZ (instead of Keybase's central account system)
  • Chat layer → NATS broker with E2E in the client (instead of Keybase Chat servers)
  • File layer → Iroh peer-to-peer with content addressing (instead of KBFS)

2. Three-layer architecture

┌─────────────────────────────────────────────────────────────┐
│                kez-chat application                         │
│           (chat UI, file browser, profile views)            │
└────┬──────────────┬─────────────────────┬───────────────────┘
     │              │                     │
     ▼              ▼                     ▼
┌─────────┐   ┌──────────┐         ┌────────────────┐
│   KEZ   │   │   NATS   │         │     Iroh       │
│         │   │          │         │                │
│ ↓ who   │   │ ↓ chat   │         │ ↓ file blobs   │
│ ↓ what  │   │ ↓ tickets│         │ ↓ on-demand    │
│   they  │   │ ↓ presence         │ ↓ NAT travers. │
│   own   │   │ ↓ small  │         │ ↓ E2E in QUIC  │
│ ↓ where │   │   stuff  │         │                │
│   they  │   │          │         │                │
│   listen│   │ dumb     │         │                │
│         │   │ broker;  │         │                │
│         │   │ clients  │         │                │
│         │   │ do E2E   │         │                │
└─────────┘   └──────────┘         └────────────────┘
     │              ▲                       ▲
     └─────────── sigchain ──────────────────┘
       (handle → KEZ primary → endpoints
        and links to other identities)

Each layer does one thing well. Each is replaceable without touching the others. The KEZ sigchain is the bridge that ties them together — it tells a verifier "this user's broker is X, their Iroh nodes are Y₁ and Y₂."


3. Identity & username model

3.1 Handles

Handles look like email and Mastodon addresses:

@tudisco@kez.lat
@chris@kez.lat
@alice@chris.com         ← custom domain, opted out of default

kez.lat is the placeholder default home server domain. We'll replace this with the actual production domain once chosen. The application treats whatever's after the @ as the user's home server — multiple servers can exist, federation is by convention (same model as email).

In the UI, when the home server matches the app's default, handles are displayed bare (@tudisco). Custom domains always display the full form (@chris@chris.com) so users can tell when they're talking to a non-default-server user.

3.2 Key generation tied to username

When a user creates an account:

  1. App generates a fresh ed25519 keypair locally.
    • This is the user's KEZ primary key.
    • It's also their NATS nkey for the chat broker (same key, same algorithm).
  2. App registers @username on the home server's handle registry
    • POSTs a signed registration request: { "handle": "tudisco", "primary": "ed25519:<hex>" }
    • The signature proves the user controls the private key.
    • The registry rejects squatting (first-come-first-served per home server).
  3. App initializes a sigchain for the new primary
    • First event: add_endpoint advertising the NATS broker the app will use.
    • Second event: add_endpoint advertising the Iroh NodeId the local app is using.
  4. App uploads the sigchain to a kez-sig-server (optional but recommended; otherwise the chain lives only on the user's device).

After this flow the user has a fully working KEZ identity:

  • @tudisco@kez.lat resolves via the handle registry to their primary key.
  • That key's sigchain advertises their NATS broker and Iroh nodes.
  • Other users can verify them and reach them.

3.3 Why ed25519 (not nostr/secp256k1) for this app

Both KEZ primaries work in general, but the chat app must use ed25519 because:

  • NATS nkeys are ed25519. Direct alignment: the user's KEZ primary key is their NATS credential. No second auth scheme.
  • Iroh node IDs are ed25519. Same primitive, native fit.
  • One key type to manage. Users with a pre-existing nostr key can still attach it to their KEZ sigchain as a claim (so they're verifiable on nostr too), but the primary that runs the app is ed25519.

4. The home server (kez-chat-server)

A single Rust binary that bundles the home-server responsibilities. One process. Self-hostable. Anyone can run their own to be their own home for their own users.

4.1 What it does

Responsibility How
Handle registry POST /v1/register to claim @username, GET /v1/u/<handle> to look one up. SQLite-backed. Same shape as kez-id-server discussed earlier.
Sigchain mirror (optional) Mirrors kez-sig-server endpoints for users who don't want to publish elsewhere — POST /v1/sigchains/.../events, GET /v1/sigchains/.... Or proxies through to a separate kez-sig-server instance.
NATS broker host Runs (or co-runs) a NATS server with JetStream enabled for offline message delivery. Configured to use nkey-based auth tied to KEZ primary keys.
Iroh pinning node Runs an Iroh node that users can opt to push their blobs to, so files are served even when the user's own device is offline. (Optional per user.)
WebFinger endpoint /.well-known/webfinger?resource=acct:tudisco@kez.lat returns user discovery info — interop with fediverse tools.
HTTP API for clients Thin REST surface for the chat app to register, look up handles, fetch endpoints, manage settings.

4.2 Process model

For MVP, the server is a coordinator + adapter, not a full reimplementation:

┌───────────────────────────────────────────────────────────┐
│         kez-chat-server process (one Rust binary)         │
│  - HTTP API (axum)                                        │
│  - Handle registry (SQLite)                               │
│  - NATS auth callout (validates nkey signatures)          │
│  - Sigchain mirror (axum routes — could reuse             │
│    rust-sig-server code)                                  │
└──┬──────────────────────┬────────────────────────────────┘
   │ launches/manages     │ talks to via API
   ▼                      ▼
┌──────────────┐      ┌──────────────┐
│ nats-server  │      │ iroh-relay   │  (optional, for users
│ (Go binary)  │      │ (Rust)       │   who want pinning)
│ + JetStream  │      │              │
└──────────────┘      └──────────────┘

The Rust server doesn't reimplement NATS or Iroh — it sits beside them. Operator runs the three processes together (Docker compose, systemd unit, or whatever). The chat-server provides the KEZ-aware integration: authenticating NATS connections against the handle registry, serving sigchain endpoints, exposing a clean HTTP API to client apps.

4.3 Endpoints (sketch)

GET  /v1/healthz
GET  /v1/u/:handle                       handle → { primary, sigchain_url, endpoints }
POST /v1/register                        claim a handle (signed body)
GET  /.well-known/webfinger?resource=...

# Sigchain mirror (same as kez-sig-server)
GET  /v1/sigchains/:scheme/:id
POST /v1/sigchains/:scheme/:id/events
GET  /v1/sigchains/:scheme/:id/head

# NATS auth callout (called by nats-server, not by users)
POST /internal/nats/auth                 verify nkey signature, return permissions

# Iroh pinning (optional)
POST /v1/pin                             pin a blob for offline serving
GET  /v1/pin/:hash                       check pinning status

The NATS broker and Iroh node are out-of-process — clients connect to them directly (mqtt://nats.kez.lat:4222, Iroh direct or via relays).


5. End-to-end flows

5.1 Account creation — @tudisco@kez.lat

1. User opens kez-chat-app, clicks "Create account"
2. App: generates ed25519 keypair locally
3. App: user picks handle "tudisco"
4. App → kez-chat-server:
     POST /v1/register
     { "handle": "tudisco",
       "primary": "ed25519:<pubkey-hex>",
       "registration_sig": "<sig over canonical message>" }
5. Server: validates signature, checks handle is free, stores in registry
6. Server: 201 Created
7. App: initializes sigchain locally, signs:
     { op: "add_endpoint",
       payload: { protocol: "nats",
                  url: "nats://nats.kez.lat:4222",
                  inbox: "kez.inbox.<pubkey-hex>" } }
     { op: "add_endpoint",
       payload: { protocol: "iroh",
                  node_id: "<local iroh node id>" } }
8. App → server:
     POST /v1/sigchains/ed25519/<pubkey-hex>/events  (twice, one per event)
9. App: connects to NATS broker with nkey auth, subscribes to inbox topic
10. Done — user is @tudisco@kez.lat, online, reachable

5.2 Adding a contact

1. Tudisco wants to add Chris. Types "@chris" in app.
2. App → kez-chat-server: GET /v1/u/chris
   Returns: { primary: "ed25519:abc...", sigchain_url: "..." }
3. App fetches the sigchain → walks events → extracts:
     - nostr/github/dns/etc. claims (for verification)
     - NATS broker URL + inbox topic
     - Iroh node IDs
4. App displays Chris's profile: verified accounts, avatar (from sigchain
   metadata if present), join date
5. App stores LOCAL binding: { "@chris@kez.lat" => ed25519:abc... }
   (TOFU — trust on first use)

5.3 Sending a chat message

1. Tudisco types "hello" in the chat with Chris.
2. App: looks up Chris's primary key + NATS endpoint from local store.
3. App: derives a symmetric key via ECDH:
     X25519(tudisco_priv, chris_pub) → KDF → 32-byte symmetric key
4. App: encrypts "hello" with ChaCha20-Poly1305 + the derived key.
5. App: signs the ciphertext with tudisco's KEZ primary (so chris can
   verify the sender, not just decrypt).
6. App: publishes to NATS subject `kez.inbox.<chris-pubkey-hex>` on
   chris's broker, with JetStream delivery (durable, will queue if
   chris is offline).
7. Chris's app receives from his subscribed inbox subject.
8. Chris's app: verifies signature against tudisco's key, decrypts, shows
   "tudisco: hello".

For 1:1 chat, the broker never sees:

  • The message contents
  • Who tudisco is talking to (the subject is chris's inbox, but anyone could publish there)
  • The relationship between sender and recipient (sender's identity is in the encrypted+signed payload, not in the NATS metadata)

5.4 Sharing a file

1. Tudisco drags `report.pdf` into the chat with Chris.
2. App: imports blob into local Iroh node → gets BLAKE3 hash + ticket.
3. App: optionally adds entry to tudisco's shared-files manifest
   (visible in his profile if Chris later browses it).
4. App: encrypts the Iroh ticket (and a content key for the blob, if
   the file is wrapped with a per-recipient symmetric key) with the
   same E2E mechanism as chat messages.
5. App: publishes to chris's NATS inbox: { type: "file_share",
   filename: "report.pdf", ticket: "...", content_key: "..." }
6. Chris's app receives the notification, displays:
   "tudisco shared report.pdf (1.2 MB)"  [Download]
7. Chris clicks Download.
8. App: opens Iroh connection to tudisco's NodeId (from sigchain), pulls
   the blob via the ticket, decrypts with the content key, verifies
   BLAKE3 hash. File appears.

If tudisco is offline at step 8 and he's opted into pinning, Chris's app fetches from kez.lat's pinning node instead. Same protocol, just a different source.

5.5 Browsing someone's files (Keybase-style)

1. Chris opens tudisco's profile.
2. App: resolves @tudisco → primary → sigchain.
3. Sigchain has a `set_shared_files` op with a manifest blob hash.
4. App: fetches the manifest blob (small, fast) via Iroh.
5. App: decrypts entries that are wrapped for chris's key, ignores ones
   it can't decrypt (those are wrapped for other people).
6. App: renders the visible entries with name, size, share date,
   thumbnail if present.
7. Chris clicks an entry to download — same as 5.4 step 8.

The manifest is small (KBs); only blobs Chris actually wants are fetched. No background sync of multi-GB folders.


6. Project & folder layout

6.1 Where this project lives

/Kez
├── rust-lib/                ← (proposed) shared Rust libraries
│   ├── Cargo.toml           workspace
│   └── crates/
│       ├── kez-core/        moved from rust/crates/
│       └── kez-channels/    moved from rust/crates/
│
├── rust/                    ← Rust CLI (kez binary)
│   └── crates/
│       └── kez-cli/         depends on ../../rust-lib/crates/...
│
├── rust-sig-server/         ← optional sigchain HTTP store
│
├── kez-chat/                ← THIS PROJECT
│   ├── document.md          (this file)
│   ├── Cargo.toml
│   ├── src/
│   │   ├── main.rs
│   │   ├── handles.rs       handle registry
│   │   ├── sigchain.rs      sigchain mirror (or proxy)
│   │   ├── nats_auth.rs     NATS auth callout
│   │   ├── pin.rs           Iroh pinning
│   │   └── api.rs           HTTP routes
│   ├── deploy/
│   │   ├── docker-compose.yml   chat-server + nats + iroh
│   │   ├── nats.conf
│   │   └── systemd/
│   └── tests/
│
├── nodejs/                  ← (unchanged)
└── crosstest.sh             ← (path updates if rust-lib moves)

6.2 The rust-lib/ proposal — share code, no duplication

Right now, kez-core and kez-channels live inside rust/crates/. The sig-server and the chat-server both want to use them. With everything in rust/, downstream projects have to do:

kez-core = { path = "../rust/crates/kez-core" }

…which works but feels off (why does a separate project reach into another project's crates/?).

Recommendation: move the pure libraries out into a top-level rust-lib/ workspace. The CLI stays in rust/. Downstream servers depend on ../rust-lib/crates/kez-core. Clean structure, no duplication, no confusion about which folder owns the library code.

Refactor effort: small but real.

  • mv rust/crates/kez-core rust-lib/crates/
  • mv rust/crates/kez-channels rust-lib/crates/
  • Create rust-lib/Cargo.toml (workspace).
  • Update rust/Cargo.toml to have just kez-cli.
  • Update path deps in: rust/crates/kez-cli/Cargo.toml, rust-sig-server/Cargo.toml.
  • Update crosstest.sh if any paths are hardcoded.

Suggested order: do the refactor first, then start kez-chat with clean imports. Otherwise we'll write path = "../rust/crates/..." for the chat-server and have to fix it later anyway.

6.3 Dependencies (planned)

Crate Why
kez-core (path) Identity types, sigchain, claim signing
kez-channels (path) Verify users' linked accounts when displayed
axum 0.8 HTTP API
tokio Async runtime
rusqlite (bundled) Handle registry
async-nats NATS client (for the auth callout and maybe utility)
iroh Iroh node (for pinning)
iroh-blobs Blob handling
serde / serde_json Standard
thiserror / anyhow Standard
tracing / tracing-subscriber Logging
tower-http CORS, request tracing
clap CLI args

6.4 The actual NATS broker

We don't write a NATS broker. We run one alongside the Rust server:

  • Use the official nats-server Go binary (downloaded from nats.io or built from source).
  • Configure with JetStream enabled (for offline delivery via durable consumers).
  • Configure auth callout pointing at the kez-chat-server's internal endpoint, so connection auth defers to the KEZ registry.
  • Run in the same Docker compose / systemd target as the Rust server.

NATS clustering for redundancy is a later concern.

6.5 The actual Iroh node

We DO embed Iroh in-process — Iroh is a Rust library and works as such. The chat-server runs an iroh::Node and offers it as a pinning service for users who opt in.

For client apps: they run their own Iroh node locally too. The chat-server's Iroh node is just a peer — albeit one that's always online and willing to hold blobs.


7. MVP scope

What ships in v0:

  • kez-chat-server binary
    • Handle registry (POST /register, GET /u/:handle)
    • Sigchain mirror (proxy or own copy)
    • NATS auth callout
    • WebFinger endpoint
    • HTTP healthz/metrics
  • NATS broker config + deployment recipe
  • Iroh pinning node embedded (optional per user)
  • Docker compose for the whole bundle (server + nats + iroh node)
  • Integration tests against a real NATS + Iroh

What the client app needs to do (separate project? kez-chat-app/?):

  • Account creation flow (key gen + handle registration)
  • Contact lookup + verification
  • 1:1 chat (E2E via NATS)
  • File send/receive (E2E via Iroh)
  • Shared-files manifest browse + fetch
  • Profile view (sigchain visualization)

For v0, CLI client is fine (kez-chat send @chris "hello"). UI app comes later.


8. Out of scope (v0)

  • Group chat
  • Forward secrecy (Double Ratchet / MLS) — chat is encrypted but not ratcheting in v0
  • Voice / video calls
  • Multi-device key sync — user has one device with their key for v0
  • Account recovery / lost-key flows — protocol's rotate op exists but UX for recovery isn't designed yet
  • Federation across home servers — protocol allows it, but the v0 app may only resolve handles on its configured default server
  • Channel publishing (gist, DNS, ActivityPub, bluesky) — the kez CLI already has these; not duplicated here. User can run kez claim ... separately to add channel proofs to their sigchain.
  • Avatars / display name — could just use nostr:npub metadata or a separate sigchain op; defer the design

9. Open design questions

These need resolving before serious implementation:

  1. Bundle or separate sigchain server?

    • kez-chat-server includes its own sigchain mirror (one less moving piece for operators)
    • …or it depends on a separate kez-sig-server (proper layering)
    • Lean: bundle for MVP, factor out later if multiple chat servers want to share.
  2. Iroh pinning by default or opt-in?

    • Default-on: better UX, more storage cost for the server operator
    • Opt-in: simpler operator story, worse first-use UX for users whose phones are off
    • Lean: opt-in for v0; let users push the pin button per-file. Default-on later.
  3. NATS broker: bundled or BYO?

    • kez-chat-server can spawn/manage nats-server as a child process
    • …or it can assume operator runs NATS separately and just point at it
    • Lean: BYO with documented config. We don't reinvent process management.
  4. Manifest format

    • Single JSON blob, signed, hash committed via sigchain set_shared_files op
    • …or Iroh Doc (CRDT-synced)
    • Lean: single signed blob for v0; simpler, no Iroh Docs dep.
  5. Handle uniqueness scope

    • Per home server (tudisco@kez.lat vs tudisco@example.com can be different people)
    • Globally enforced somehow (not really possible without a central registry)
    • Lean: per home server. Federation handles global resolution.
  6. What about KEZ's existing nostr: channel for messaging?

    • It already works for chat-like messages via NIP-44 DMs
    • NATS is a separate stack — not interoperable
    • Lean: NATS is the chat substrate for this app. Users who want to send a nostr DM can use a separate nostr client. The KEZ identity is the same; the transport is the user's choice per conversation. Document this in the UI.
  7. Recovery story when you lose your key

    • Spec has rotate op — old key signs that new key is now primary
    • But if you lost the old key, you can't sign the rotation
    • Possible solutions:
      • User must keep paper-backup of their key (Bitcoin model)
      • User can pre-sign rotation events to multiple device keys (multi-device redundancy)
      • Home server holds an offline emergency-recovery key (centralized fallback; opt-in)
    • Defer detailed design to a later doc.

10. Risks & honest concerns

  1. NATS auth callout integration depth. The callout pattern is documented but the chat-server needs to handle it correctly for security. nkey signature verification is straightforward but the integration glue (subject permissions per user, JetStream stream creation) needs care.

  2. Iroh is pre-1.0. API may shift. Pin a version, plan for a future upgrade pass. The good news: identity stays stable (it's KEZ); only the transport library needs to be migrated.

  3. Multi-device. The MVP assumes one device per user, one key. Real users have phones + laptops. Multi-device key management is a deep topic — addressed in a follow-up doc.

  4. Spam in handle registration. First-come-first-served is easy to game. Mitigations:

    • Proof-of-work on registration?
    • Email-based gating (introduces centralization)?
    • Rate-limit by IP, accept the leakage
    • Defer to v0; revisit if it becomes a problem.
  5. NAT traversal for Iroh. Iroh handles it via relays, but corporate networks are sometimes hostile. Have a "use server's pinning as relay" fallback documented.

  6. Operational cost. Running NATS + Iroh + a Rust server isn't free.

    • NATS scales horizontally, low resource use
    • Iroh nodes can chew through disk if pinning is enabled liberally
    • Need a clear "I'm running kez.lat for 1000 users — what does it cost?" answer before community adoption.

11. The plan, sequenced

When we start building:

  1. Refactor: move kez-core + kez-channels to rust-lib/. Tiny but unblocks everything else from having clean imports.

  2. Build kez-chat-server scaffold (axum + sqlite + tracing). Handle registry + WebFinger first — these are the simplest endpoints and unblock client-side account creation.

  3. Add NATS auth callout. Spawn nats-server separately, configure it to call our /internal/nats/auth endpoint. End-to-end: client can register a handle and connect to NATS with their nkey.

  4. Build a minimal kez-chat CLI client that does:

    • kez-chat register tudisco
    • kez-chat add @chris
    • kez-chat send @chris "hello"
    • kez-chat listen No UI yet. Enough to prove the chat flow works end-to-end.
  5. Add Iroh integration to both server and CLI client.

    • Server: embedded iroh node for pinning
    • Client: local iroh node, blob send/receive
    • CLI: kez-chat share @chris ./file.pdf, kez-chat browse @tudisco
  6. Shared-files manifest (sigchain set_shared_files op, manifest blob format).

  7. Deployment recipe: docker-compose, systemd unit, deployment doc.

  8. Then start the GUI app. Could be Tauri (Rust + web frontend), Iced (pure Rust UI), or whatever the user wants.


12. One-paragraph summary

kez-chat is a Keybase-class chat and file-sharing app built on the KEZ identity stack. Users get @username@kez.lat handles backed by an ed25519 primary key. The same key authenticates to a NATS broker (chat, presence, file tickets — broker is dumb, clients do E2E with ChaCha20-Poly1305 over ECDH-derived keys) and identifies an Iroh node (P2P bulk transfer, content-addressed blobs, on-demand fetch). A single Rust binary (kez-chat-server) coordinates the handle registry, NATS auth, optional sigchain mirror, and optional Iroh pinning. The chat-app itself is a separate project that consumes the server's HTTP API plus talks directly to NATS and Iroh.