Sharpen the framing: our project doesn't ship, embed, supervise, or even sit-next-to NATS. NATS is external infrastructure the operator provides (their own server, Synadia Cloud, whatever) and we connect to it the way an app connects to a database. Changes: - §4.2 process model: redraw the diagram showing NATS *outside* our deployment boundary (with a dashed line for "external"), our two services on one side, chat-server reaches out to the operator's NATS via the auth callout. - §4.3 docker-compose sketch: remove the nats container entirely. Our compose ships chat-server + sig-server only. NATS_URL is an environment variable the operator sets. We document the nats.conf snippet the operator needs to add to their own NATS deployment. - §6.4 NATS broker section rewritten as "external dependency" — what we require from the operator's NATS (version, JetStream, callout config), and why we don't bundle it (NATS is its own ops problem; operators may already have one; we shouldn't lock them in). - §11 sequenced plan step 3: developers spin up a local NATS for testing via Appendix A, not "run nats-server in a sibling container." - Decisions-locked row for NATS now explicit: "We don't ship, embed, or supervise it. We connect to whatever broker NATS_URL points at." - New Appendix A: "running a NATS broker locally for development" — one-liner docker run for testing, with explicit "this is dev only, not the production deployment recipe." - §12 one-paragraph summary updated to reflect "our project ships two services" (chat-server + sig-server), NATS is external.
775 lines
32 KiB
Markdown
775 lines
32 KiB
Markdown
# KEZ Chat & File Share — Design Document
|
|
|
|
**Status:** Pre-implementation planning. No code yet.
|
|
**Last updated:** 2026-05-24
|
|
**Goal:** A Keybase-class chat + file sharing experience built on the KEZ
|
|
identity stack, with NATS for messaging and Iroh for file transfer.
|
|
|
|
---
|
|
|
|
## 1. What this is
|
|
|
|
A real-time chat + file-sharing application with verified identities.
|
|
|
|
- Users get human-friendly handles like `@tudisco@kez.lat`.
|
|
- The handle is bound to a KEZ ed25519 primary key; the same key
|
|
authenticates to the chat infrastructure.
|
|
- Conversations are end-to-end encrypted; the broker is dumb.
|
|
- Files are visible in the sender's "shared files" list but only
|
|
downloaded when a recipient actually wants them. No background sync.
|
|
- Identity is portable: the underlying key + sigchain survives the home
|
|
server going dark. Handles can be migrated to other servers later.
|
|
|
|
This is the Keybase model rebuilt on a decentralized substrate:
|
|
- **Identity layer** → KEZ (instead of Keybase's central account system)
|
|
- **Chat layer** → NATS with client-side E2E (instead of Keybase Chat)
|
|
- **File layer** → Iroh peer-to-peer with content addressing (instead of KBFS)
|
|
|
|
---
|
|
|
|
## 2. Three-layer architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────┐
|
|
│ kez-chat application │
|
|
│ (chat UI, file browser, profile views) │
|
|
└────┬──────────────┬─────────────────────┬───────────────────┘
|
|
│ │ │
|
|
▼ ▼ ▼
|
|
┌─────────┐ ┌──────────┐ ┌────────────────┐
|
|
│ KEZ │ │ NATS │ │ Iroh │
|
|
│ │ │ │ │ │
|
|
│ ↓ who │ │ ↓ chat │ │ ↓ file blobs │
|
|
│ ↓ what │ │ ↓ tickets│ │ ↓ on-demand │
|
|
│ they │ │ ↓ presence │ ↓ NAT travers. │
|
|
│ own │ │ ↓ small │ │ ↓ E2E in QUIC │
|
|
│ ↓ where │ │ stuff │ │ │
|
|
│ they │ │ │ │ │
|
|
│ listen│ │ dumb │ │ │
|
|
│ │ │ broker; │ │ │
|
|
│ │ │ clients │ │ │
|
|
│ │ │ do E2E │ │ │
|
|
└─────────┘ └──────────┘ └────────────────┘
|
|
│ ▲ ▲
|
|
└─────────── sigchain ──────────────────┘
|
|
(handle → KEZ primary → endpoints
|
|
and links to other identities)
|
|
```
|
|
|
|
Each layer does one thing well. Each is replaceable without touching the
|
|
others. The KEZ sigchain is the bridge that ties them together — it tells
|
|
a verifier "this user's broker is X, their Iroh nodes are Y₁ and Y₂."
|
|
|
|
---
|
|
|
|
## 3. Identity & username model
|
|
|
|
### 3.1 Handles
|
|
|
|
Handles look like email and Mastodon addresses:
|
|
|
|
```
|
|
@tudisco@kez.lat
|
|
@chris@kez.lat
|
|
@alice@chris.com ← custom domain, opted out of default (future)
|
|
```
|
|
|
|
`kez.lat` is the placeholder default home server domain. We'll lock in
|
|
the real production domain before launch.
|
|
|
|
For v0, **the handle namespace is global** — registration is on the one
|
|
default home server. Federation (multiple servers with their own
|
|
namespaces) is deliberately not in v0, but the design must not preclude
|
|
it. See §3.5.
|
|
|
|
In the UI, since there's only one home server in v0, handles are
|
|
displayed bare (`@tudisco`). The `@kez.lat` suffix is implied and stored
|
|
internally.
|
|
|
|
### 3.2 Key generation tied to username
|
|
|
|
When a user creates an account:
|
|
|
|
1. App generates a **fresh ed25519 keypair** locally.
|
|
- This is the user's KEZ primary key.
|
|
- It's also their NATS nkey for the chat broker (same key, same algorithm).
|
|
- It's also their Iroh node identity (same primitive again).
|
|
2. App **registers `@username`** on the home server's handle registry.
|
|
- Sends a signed registration request proving control of the private key.
|
|
- Registry rejects squatting (first-come-first-served).
|
|
3. App **initializes a sigchain** for the new primary.
|
|
- First event: `add_endpoint` advertising the NATS broker the app will use.
|
|
- Second event: `add_endpoint` advertising the Iroh NodeId of the local device.
|
|
4. App **uploads the sigchain** to the deployed `kez-sig-server`.
|
|
|
|
After this flow the user has a fully working KEZ identity:
|
|
- `@tudisco@kez.lat` resolves via the handle registry to their primary key.
|
|
- That key's sigchain (on `kez-sig-server`) advertises their NATS broker and Iroh nodes.
|
|
- Other users can verify them and reach them.
|
|
|
|
### 3.3 Why ed25519 only for this app
|
|
|
|
Both KEZ primary types work in general, but the chat app **requires** ed25519:
|
|
|
|
- **NATS nkeys are ed25519.** Direct alignment: the user's KEZ primary key
|
|
is their NATS credential. No second auth scheme.
|
|
- **Iroh node IDs are ed25519.** Same primitive, native fit.
|
|
- **One key type to manage.** Users with a pre-existing nostr key can
|
|
still attach it to their KEZ sigchain as a verifiable claim (so they're
|
|
cross-referenced on nostr too), but the primary that runs the app is
|
|
ed25519. The nostr key never participates in chat or file transport.
|
|
|
|
### 3.4 Account recovery: paper backup (Keybase-style)
|
|
|
|
The user's ed25519 private key is the only thing that can prove their
|
|
identity. Lose it, lose the account.
|
|
|
|
Recovery model for v0:
|
|
|
|
- On account creation, the app converts the 32-byte ed25519 seed to a
|
|
**mnemonic phrase** (BIP-39 style, 24 words). Standard, well-tested
|
|
word lists, deterministic encoding.
|
|
- App **forces the user to write it down** before continuing — shows
|
|
the words, asks for confirmation, then asks them to retype a few
|
|
random words back to prove they recorded it.
|
|
- App stores the seed locally in OS-protected storage (Keychain,
|
|
Credential Manager, libsecret). Mnemonic is shown only at creation
|
|
and on-demand from settings.
|
|
- **Lost device flow:** user installs the app on a new device, types
|
|
their mnemonic, app regenerates the same ed25519 keypair, then pulls
|
|
the sigchain from `kez-sig-server` to restore their identity state.
|
|
- The handle is still theirs because the registry knows the primary key.
|
|
|
|
No server-side recovery. No email reset. No customer support. Same model
|
|
Bitcoin wallets and Keybase used — user holds the seed phrase, user is
|
|
responsible for it.
|
|
|
|
### 3.5 Federation-ready design (not in v0)
|
|
|
|
For v0 we have **one** home server (`kez.lat`). All handles live there.
|
|
To make sure we don't paint ourselves into a corner:
|
|
|
|
1. **Internal representation of a handle is always the qualified form**
|
|
(`tudisco@kez.lat`), never just `tudisco`. The UI strips the suffix
|
|
for display; storage always keeps the full form.
|
|
2. **Handle resolution is HTTP-based**, not hard-coded. The chat app
|
|
looks up `chris@kez.lat` by hitting `https://kez.lat/v1/u/chris`.
|
|
When federation lands, looking up `chris@example.com` hits
|
|
`https://example.com/v1/u/chris` instead.
|
|
3. **WebFinger endpoint included from v0** — so cross-server discovery
|
|
already works via standard tooling, even if our app only uses our
|
|
own server for now.
|
|
4. **Sigchain endpoint URLs are fully qualified.** A user's sigchain
|
|
lives at `https://sig.kez.lat/v1/sigchains/ed25519/<hex>` — when
|
|
another server's user wants to verify ours, the URL is right there.
|
|
|
|
The v0 chat app might hard-code "lookups go to `kez.lat`" for now;
|
|
flipping that to "lookups go to whatever's after the `@`" is a config
|
|
change later, not a redesign.
|
|
|
|
---
|
|
|
|
## 4. The home server (`kez-chat-server`)
|
|
|
|
A single Rust binary, deployed as one container alongside other
|
|
microservices (NATS broker, sigchain server).
|
|
|
|
### 4.1 What it does (and what it doesn't)
|
|
|
|
| Responsibility | This server? |
|
|
|---|---|
|
|
| **Handle registry** | ✅ Yes |
|
|
| **NATS auth callout** | ✅ Yes |
|
|
| **WebFinger endpoint** | ✅ Yes |
|
|
| **HTTP API for clients** | ✅ Yes |
|
|
| **Sigchain storage** | ❌ No — defer to `kez-sig-server` (separate container) |
|
|
| **NATS broker** | ❌ No — separate `nats-server` (Go) container |
|
|
| **Iroh pinning** | ❌ No for v0 — files transfer P2P when both peers are online. Pinning is a future tier. |
|
|
| **Channel verification (gist/dns/etc.)** | ❌ No — clients do it locally via `kez-channels`. KEZ system is only used for identity, not as part of chat. |
|
|
|
|
The chat server is deliberately small. Microservices: each service does
|
|
one thing, deployed independently. Operator runs three containers
|
|
(chat-server + nats-server + sig-server). When pinning lands later, that
|
|
becomes a fourth optional container.
|
|
|
|
### 4.2 Process / deployment model
|
|
|
|
NATS is **not part of our deployment.** The operator runs NATS however
|
|
they want (Synadia Cloud, their own cluster, a friend's broker, a single
|
|
local container) and gives the chat-server a URL. Same idea as a
|
|
database: we connect to one; we don't ship one.
|
|
|
|
```
|
|
External infrastructure
|
|
(operator's responsibility)
|
|
┌──────────────────────┐
|
|
│ NATS broker │
|
|
│ + JetStream │
|
|
│ somewhere │
|
|
└─────────▲─────▲──────┘
|
|
│ │
|
|
chat-server ──────┘ │ ◄────── client app
|
|
(auth callout) │ (publish/subscribe)
|
|
│
|
|
┌─────────────── our deployment ─────────────────┐
|
|
│ │
|
|
│ ┌─────────────────┐ ┌────────────────┐ │
|
|
│ │ kez-chat-server │ │ kez-sig-server │ │
|
|
│ │ (Rust) │ │ (Rust) │ │
|
|
│ │ │ │ │ │
|
|
│ │ ↓ handles │ │ ↓ sigchain │ │
|
|
│ │ ↓ nats auth │ │ storage │ │
|
|
│ │ ↓ HTTP API │ │ │ │
|
|
│ └─────────────────┘ └────────────────┘ │
|
|
│ ▲ ▲ │
|
|
└─────────┼──────────────────────┼───────────────┘
|
|
│ │
|
|
┌──────┴──────────────────────┴────────────────────────┐
|
|
│ Chat app (per user, runs on phone/desktop) │
|
|
│ │
|
|
│ • talks to the operator's NATS broker (NATS proto) │
|
|
│ • talks to kez-chat-server over HTTPS │
|
|
│ • talks to kez-sig-server over HTTPS │
|
|
│ • runs local iroh::Node for file send/receive │
|
|
└──────────────────────────────────────────────────────┘
|
|
```
|
|
|
|
The chat-server orchestrates auth against whatever NATS broker is
|
|
configured, but doesn't run, host, supervise, or ship NATS in any form.
|
|
|
|
### 4.3 docker-compose sketch (our two services only)
|
|
|
|
```yaml
|
|
# deploy/docker-compose.yml — what we ship
|
|
services:
|
|
chat-server:
|
|
build: . # kez-chat-server Rust binary
|
|
environment:
|
|
NATS_URL: ${NATS_URL} # operator points us at their NATS broker
|
|
SIG_SERVER_URL: http://sig-server:7878
|
|
DB_PATH: /data/handles.db
|
|
AUTH_CALLOUT_NKEY_PATH: /etc/kez/auth-callout.nkey
|
|
volumes:
|
|
- chat-data:/data
|
|
- ./auth-callout.nkey:/etc/kez/auth-callout.nkey:ro
|
|
depends_on: [sig-server]
|
|
ports:
|
|
- "8080:8080" # HTTP API for clients
|
|
|
|
sig-server:
|
|
image: kez-sig-server:latest # the existing rust-sig-server
|
|
environment:
|
|
KEZ_DB: /data/sigchains.db
|
|
volumes:
|
|
- sig-data:/data
|
|
ports:
|
|
- "7878:7878"
|
|
|
|
volumes:
|
|
chat-data:
|
|
sig-data:
|
|
```
|
|
|
|
**NATS is not in this file.** The operator brings their own — running
|
|
on a different host, in a different compose project, on Synadia Cloud,
|
|
or wherever. They give us `NATS_URL` and a place to put our auth
|
|
callout endpoint URL in their `nats.conf`.
|
|
|
|
What the operator needs to add on the NATS side (in **their** config):
|
|
|
|
```conf
|
|
# nats.conf — added to whatever NATS deployment the operator runs
|
|
authorization {
|
|
auth_callout {
|
|
issuer: "<our auth-callout signing nkey public part>"
|
|
auth_users: ["AUTHUSER"] # placeholder identity NATS uses
|
|
account: "DEFAULT"
|
|
}
|
|
}
|
|
```
|
|
|
|
The chat-server signs auth-callout responses with a long-lived nkey
|
|
that NATS trusts. When a client connects to NATS with their KEZ
|
|
ed25519 key, NATS forwards the auth request to our chat-server,
|
|
which checks the handle registry and signs a yes/no response.
|
|
|
|
We provide a reference `nats.conf` snippet in the docs. The operator
|
|
patches it into their own NATS deployment.
|
|
|
|
For local development, see Appendix A.
|
|
|
|
### 4.4 Endpoints
|
|
|
|
```
|
|
GET /v1/healthz
|
|
GET /v1/u/:handle handle → { primary, sigchain_url, endpoints }
|
|
POST /v1/register claim a handle (signed body)
|
|
GET /.well-known/webfinger?resource=acct:tudisco@kez.lat
|
|
|
|
# NATS auth callout (called BY nats-server, not by users)
|
|
POST /internal/nats/auth verify nkey signature, return permissions
|
|
```
|
|
|
|
Sigchain endpoints are **not** on this server — clients talk directly to
|
|
`kez-sig-server` for those.
|
|
|
|
---
|
|
|
|
## 5. End-to-end flows
|
|
|
|
### 5.1 Account creation — `@tudisco@kez.lat`
|
|
|
|
```
|
|
1. User opens chat app, clicks "Create account"
|
|
2. App: generates ed25519 keypair locally
|
|
3. App: converts seed to 24-word mnemonic, makes user write it down,
|
|
verifies recall before continuing
|
|
4. App: user picks handle "tudisco"
|
|
5. App → chat-server:
|
|
POST /v1/register
|
|
{ "handle": "tudisco",
|
|
"primary": "ed25519:<pubkey-hex>",
|
|
"registration_sig": "<sig over canonical message>" }
|
|
6. Server: validates signature, checks handle is free, stores in registry
|
|
7. Server: 201 Created
|
|
8. App: initializes sigchain locally, signs:
|
|
- add_endpoint { protocol: "nats", url: "...", inbox: "kez.inbox.<pk>" }
|
|
- add_endpoint { protocol: "iroh", node_id: "<local iroh id>" }
|
|
9. App → sig-server: POST /v1/sigchains/ed25519/<pk>/events (one per event)
|
|
10. App: connects to nats-server with nkey auth (signed challenge,
|
|
nats-server invokes chat-server's auth callout, gets back yes/no
|
|
+ allowed subjects)
|
|
11. App: subscribes to JetStream durable consumer on its inbox subject
|
|
12. Done — @tudisco@kez.lat is live and reachable
|
|
```
|
|
|
|
### 5.2 Adding a contact
|
|
|
|
```
|
|
1. Tudisco types "@chris" in app
|
|
2. App → chat-server: GET /v1/u/chris
|
|
Returns: { primary: "ed25519:abc...", sigchain_url: "https://sig.kez.lat/..." }
|
|
3. App → sig-server (URL from above): fetch sigchain
|
|
4. App walks events to extract:
|
|
- NATS broker URL + inbox subject (from add_endpoint nats)
|
|
- Iroh node IDs (from add_endpoint iroh)
|
|
- Other identity claims (github:chris, dns:chris.com, etc. — for display)
|
|
5. App caches LOCALLY: { "@chris@kez.lat" => ed25519:abc..., endpoints: {...} }
|
|
(TOFU — trust on first use)
|
|
```
|
|
|
|
### 5.3 Sending a chat message
|
|
|
|
```
|
|
1. Tudisco types "hello" to Chris
|
|
2. App looks up Chris's primary key + NATS endpoint from local cache
|
|
3. App derives a per-message symmetric key:
|
|
X25519(tudisco_priv, chris_pub) → HKDF → 32-byte ChaCha20-Poly1305 key
|
|
4. App encrypts "hello" with that key (+ random nonce)
|
|
5. App signs ciphertext with tudisco's KEZ primary
|
|
6. App publishes to subject `kez.inbox.<chris-pubkey-hex>` on the NATS
|
|
broker, JetStream-published so the broker stores it durably
|
|
7. Chris's app (subscribed via durable consumer) receives the message
|
|
whenever next online — broker buffers it if offline
|
|
8. Chris's app verifies signature against tudisco's key, decrypts,
|
|
shows "tudisco: hello"
|
|
```
|
|
|
|
The broker sees:
|
|
- An nkey-authenticated client publishing encrypted bytes to a subject
|
|
- It does NOT see: who's reading the subject, message contents, sender
|
|
identity (sender identity is in the signed payload, not the NATS frame)
|
|
|
|
### 5.4 Sharing a file (v0: both peers online)
|
|
|
|
```
|
|
1. Tudisco drags `report.pdf` into the chat with Chris
|
|
2. App imports the blob into local Iroh node → BLAKE3 hash + ticket
|
|
3. App optionally adds an entry to tudisco's "shared files" manifest
|
|
(visible if Chris later browses tudisco's profile)
|
|
4. App generates a per-file symmetric content key
|
|
5. App encrypts the blob in place (or stores both plaintext + encrypted —
|
|
detail for later) with the content key
|
|
6. App wraps the content key for chris's KEZ key (X25519 → HKDF)
|
|
7. App sends a NATS message to chris's inbox:
|
|
{ type: "file_share",
|
|
filename: "report.pdf",
|
|
size: 1234567,
|
|
iroh_ticket: "blobac://...",
|
|
wrapped_content_key: "..." }
|
|
(same encryption as chat messages, so chris can read this)
|
|
8. Chris's app sees the notification: "tudisco shared report.pdf (1.2 MB)"
|
|
File NOT downloaded yet.
|
|
9. Chris clicks Download.
|
|
10. Chris's app opens an Iroh connection to tudisco's NodeId (from
|
|
tudisco's sigchain), pulls the blob via the ticket, decrypts with
|
|
the unwrapped content key, verifies BLAKE3 hash. File appears.
|
|
```
|
|
|
|
**v0 limitation:** If tudisco is offline at step 10, chris waits.
|
|
Iroh will retry; download starts when tudisco's node comes back.
|
|
Pinning (the server holding a copy) is **not** in v0 — we accept this
|
|
limitation in exchange for zero server-side storage cost and the
|
|
simplest possible architecture.
|
|
|
|
### 5.5 Browsing someone's files (Keybase-style)
|
|
|
|
```
|
|
1. Chris opens tudisco's profile
|
|
2. App resolves @tudisco → primary → sigchain
|
|
3. Sigchain has a `set_shared_files` op pointing at a manifest blob hash
|
|
4. App fetches the manifest blob via Iroh (small, fast)
|
|
5. App decrypts entries wrapped for chris's key, ignores ones it can't
|
|
decrypt (those are wrapped for other people)
|
|
6. App renders the visible entries: name, size, share date,
|
|
thumbnail (if present)
|
|
7. Chris clicks an entry → flow continues like §5.4 step 9
|
|
```
|
|
|
|
Manifest is small (KB-scale); blobs are MB-to-GB. Browsing is cheap;
|
|
fetching is per-file deliberate. **Recipient never auto-syncs.**
|
|
|
|
---
|
|
|
|
## 6. Project & folder layout
|
|
|
|
### 6.1 Where this project lives
|
|
|
|
```
|
|
/Kez
|
|
├── rust-lib/ ← (proposed refactor) shared Rust libraries
|
|
│ ├── Cargo.toml workspace
|
|
│ └── crates/
|
|
│ ├── kez-core/ moved from rust/crates/
|
|
│ └── kez-channels/ moved from rust/crates/
|
|
│
|
|
├── rust/ ← Rust CLI (kez binary)
|
|
│ └── crates/
|
|
│ └── kez-cli/ depends on ../../rust-lib/crates/...
|
|
│
|
|
├── rust-sig-server/ ← existing sigchain storage (reused as-is)
|
|
│
|
|
├── kez-chat/ ← THIS PROJECT
|
|
│ ├── document.md (this file)
|
|
│ ├── Cargo.toml
|
|
│ ├── src/
|
|
│ │ ├── main.rs binary entry
|
|
│ │ ├── handles.rs handle registry (sqlite-backed)
|
|
│ │ ├── nats_auth.rs NATS auth callout endpoint
|
|
│ │ ├── webfinger.rs WebFinger discovery endpoint
|
|
│ │ └── api.rs axum routes + state
|
|
│ ├── deploy/
|
|
│ │ ├── docker-compose.yml chat-server + nats + sig-server
|
|
│ │ ├── nats.conf with auth_callout config
|
|
│ │ └── systemd/ alternative deployment
|
|
│ └── tests/
|
|
│ └── http.rs integration tests
|
|
│
|
|
├── nodejs/ ← (unchanged)
|
|
└── crosstest.sh ← (path updates if rust-lib moves)
|
|
```
|
|
|
|
### 6.2 The `rust-lib/` proposal — share code, no duplication
|
|
|
|
Right now, `kez-core` and `kez-channels` live inside `rust/crates/`. The
|
|
sig-server and the new chat-server both want to use them. Today's
|
|
downstream path-dep is:
|
|
|
|
```toml
|
|
kez-core = { path = "../rust/crates/kez-core" }
|
|
```
|
|
|
|
…which works but reaches into another project's crate tree.
|
|
|
|
**Recommendation:** move the pure libraries out into a top-level
|
|
`rust-lib/` workspace. The CLI stays in `rust/`. Downstream servers
|
|
depend on `../rust-lib/crates/kez-core`. Clean structure, no
|
|
duplication, no confusion about which folder owns what.
|
|
|
|
Refactor steps:
|
|
|
|
- `mv rust/crates/kez-core rust-lib/crates/`
|
|
- `mv rust/crates/kez-channels rust-lib/crates/`
|
|
- Create `rust-lib/Cargo.toml` (workspace).
|
|
- Update `rust/Cargo.toml` to have just kez-cli.
|
|
- Update path deps in: `rust/crates/kez-cli/Cargo.toml`,
|
|
`rust-sig-server/Cargo.toml`.
|
|
- Update `crosstest.sh` if any paths are hardcoded.
|
|
|
|
**Suggested order:** do the refactor *before* starting kez-chat so we
|
|
import cleanly from the start.
|
|
|
|
### 6.3 Dependencies (planned)
|
|
|
|
| Crate | Why |
|
|
|---|---|
|
|
| `kez-core` (path) | Identity types, ed25519, signing |
|
|
| `kez-channels` (path) | Verify users' linked accounts when displayed |
|
|
| `axum` 0.8 | HTTP API |
|
|
| `tokio` | Async runtime |
|
|
| `rusqlite` (bundled) | Handle registry |
|
|
| `async-nats` | NATS client (admin work + the auth callout glue) |
|
|
| `serde` / `serde_json` | Standard |
|
|
| `thiserror` / `anyhow` | Standard |
|
|
| `tracing` / `tracing-subscriber` | Logging |
|
|
| `tower-http` | CORS, request tracing |
|
|
| `clap` | CLI args |
|
|
|
|
**Not** depended on by the chat-server:
|
|
- `iroh` — server doesn't run an Iroh node in v0 (no pinning)
|
|
- nats-server (Go) — separate container, not a Rust dep
|
|
|
|
### 6.4 NATS broker — external dependency
|
|
|
|
NATS is **not part of our project**. It's external infrastructure the
|
|
operator provides, the same way they'd provide a database or an SMTP
|
|
relay. We ship:
|
|
|
|
- An `async-nats` client used by the chat-server (admin/utility work)
|
|
- An auth-callout HTTP endpoint that NATS calls during client connection
|
|
- A documented `nats.conf` snippet operators add to their NATS deployment
|
|
- A reference local-dev setup (Appendix A) for running NATS yourself
|
|
while developing
|
|
|
|
What we require from the operator's NATS:
|
|
|
|
| Requirement | Why |
|
|
|---|---|
|
|
| **NATS 2.10+** (for auth_callout) | We rely on auth callout to bridge KEZ identity into NATS |
|
|
| **JetStream enabled** | For offline message buffering (durable consumers) |
|
|
| **TCP reachable** from chat-server and clients | Standard |
|
|
| **TLS** (in production) | Standard |
|
|
| **auth_callout configured** to hit our endpoint | Required for client auth |
|
|
|
|
That's it. Operator can run a single Docker container, a clustered
|
|
production deployment, or a managed service — we don't care, as long
|
|
as `NATS_URL` and the callout config are correct.
|
|
|
|
Why fully external rather than alongside us:
|
|
|
|
- NATS is a serious piece of infrastructure with its own scaling and
|
|
operational concerns. Bundling it implies we're responsible for it.
|
|
We're not.
|
|
- Operators with existing NATS deployments can reuse them. No "now run
|
|
our copy of NATS too."
|
|
- Different teams might run different NATS topologies (single instance,
|
|
cluster, mesh, leaf nodes). None of that is our problem.
|
|
- Swapping NATS implementations or moving to a managed provider is a
|
|
config change, not a code change.
|
|
|
|
### 6.5 Iroh — client-side only
|
|
|
|
Clients run a local Iroh node for sending and receiving files. The
|
|
**chat-server does NOT run an Iroh node** in v0.
|
|
|
|
Implication: when @tudisco shares a file with @chris, the bytes go
|
|
directly from tudisco's device to chris's device via Iroh. If tudisco
|
|
is offline, chris waits. There's no fallback to a server-stored copy.
|
|
|
|
This is the simplest possible operational model. Pinning (server-side
|
|
fallback storage) is a future addition (§8).
|
|
|
|
---
|
|
|
|
## 7. MVP scope
|
|
|
|
### Server (`kez-chat-server`)
|
|
|
|
- [ ] HTTP API scaffold (axum + tokio)
|
|
- [ ] Handle registry (POST /register, GET /u/:handle)
|
|
- [ ] Registration signature validation (uses kez-core)
|
|
- [ ] WebFinger endpoint
|
|
- [ ] NATS auth callout (POST /internal/nats/auth)
|
|
- [ ] Healthz / metrics
|
|
- [ ] Integration tests against real nats-server + sig-server in a
|
|
test docker-compose
|
|
|
|
### Deployment
|
|
|
|
- [ ] docker-compose.yml (chat + nats + sig-server)
|
|
- [ ] nats.conf with auth_callout configured
|
|
- [ ] systemd alternative deployment recipe
|
|
- [ ] README with TLS / reverse proxy guidance
|
|
|
|
### Client (`kez-chat-cli` — separate project later)
|
|
|
|
Out of scope for the server work, but the **server isn't usable without**
|
|
at least a CLI client that does:
|
|
- [ ] Account creation (key gen + mnemonic backup + handle registration)
|
|
- [ ] Contact lookup + verification
|
|
- [ ] Send / receive 1:1 chat messages (E2E via NATS)
|
|
- [ ] Send / receive files (E2E via Iroh)
|
|
- [ ] Browse @user shared-files manifest
|
|
|
|
UI app comes after CLI proves the flow works.
|
|
|
|
---
|
|
|
|
## 8. Out of scope (v0)
|
|
|
|
- **Iroh pinning** (sender must be online for receiver to fetch)
|
|
- **Group chat** (only 1:1 for v0)
|
|
- **Forward secrecy / ratcheting** (Double Ratchet, MLS) — chat is
|
|
encrypted but each message uses the same X25519-derived key per pair
|
|
- **Voice / video calls**
|
|
- **Multi-device key sync** — one device per user in v0
|
|
- **Account recovery beyond mnemonic** — paper backup is the only recovery
|
|
- **Federation across home servers** — one server (kez.lat) in v0;
|
|
design preserves the option
|
|
- **Channel-based identity verification** — the CLI already does
|
|
`kez verify id ...`; not duplicated in the chat-server. Users add
|
|
KEZ channel proofs (gist, dns, etc.) via the existing CLI separately.
|
|
- **Avatars / display names** — defer the design. For v0 the UI shows
|
|
the handle and that's enough.
|
|
|
|
---
|
|
|
|
## 9. The one remaining open question
|
|
|
|
**Manifest format** for "@chris's shared files":
|
|
|
|
| Option | How | Tradeoff |
|
|
|---|---|---|
|
|
| **A. Signed JSON blob, hash in sigchain** | Manifest is a JSON blob stored on Iroh. A new sigchain op `set_shared_files` commits the latest manifest hash. Recipients walk the sigchain → find the pointer → fetch the manifest blob from Iroh. | Simpler. No Iroh Docs dep. Sigchain anchors the version (signed). Update = new sigchain event. |
|
|
| **B. Iroh Doc** | Manifest is a mutable CRDT document. Recipients subscribe; updates sync in near-real-time. | Fancier UX (live updates). Requires Iroh Docs subsystem (heavier dep, less stable). |
|
|
|
|
**Recommended default: A.** Simpler, fewer moving parts, reuses
|
|
primitives we already have. We can upgrade to B later if real users
|
|
need real-time profile feed updates.
|
|
|
|
Settle yes/no on this and the design is locked.
|
|
|
|
---
|
|
|
|
## Decisions locked from earlier discussion
|
|
|
|
| Question | Decision |
|
|
|---|---|
|
|
| Bundle sigchain in chat-server? | **No.** Use existing `kez-sig-server`. Microservices. |
|
|
| Bundle NATS into Rust server? | **No.** NATS is external infrastructure the operator provides. We don't ship, embed, or supervise it. We connect to whatever broker `NATS_URL` points at. |
|
|
| KEZ + nostr coexistence for chat? | **No nostr in chat.** KEZ is identity-only; nostr only as a verifiable claim in someone's sigchain, not as transport. |
|
|
| Handle scope: federation or global? | **Global for v0**, federation-ready design (see §3.5). |
|
|
| Recovery if key lost? | **Paper backup (24-word mnemonic), Keybase-style.** No server-side recovery. |
|
|
| Iroh pinning in v0? | **No.** Sender must be online for receiver to fetch. Pinning is a future tier. |
|
|
|
|
---
|
|
|
|
## 10. Risks & honest concerns
|
|
|
|
1. **NATS auth callout integration depth.** Documented but fiddly.
|
|
nkey signature verification is straightforward; the per-user subject
|
|
permission glue needs care. Test cases for "user can publish to
|
|
their own inbox only" / "user can subscribe to their own inbox
|
|
only" matter.
|
|
|
|
2. **Iroh is pre-1.0.** Pin a version. Migration is a chore but only
|
|
touches client code, not identity. Identity stays stable (KEZ).
|
|
|
|
3. **Single-device assumption.** Real users have phones AND laptops.
|
|
v0 assumes one device per primary. Designing multi-device is a
|
|
real follow-up.
|
|
|
|
4. **No offline file delivery.** A natural user complaint will be
|
|
"Chris sent me a file but he's offline now." We've made the trade
|
|
knowingly; document the limitation clearly in-app ("File will
|
|
download when @chris is back online").
|
|
|
|
5. **Handle squatting.** First-come-first-served. Mitigations:
|
|
- Rate-limit registration by IP
|
|
- Reserve some handles (`@admin`, common project names)
|
|
- Accept that some squatting will happen; document the policy
|
|
|
|
6. **NAT traversal.** Iroh handles it with relays. Test on hostile
|
|
networks (corporate firewalls, mobile carriers with CGNAT) before
|
|
claiming "just works."
|
|
|
|
7. **Operational cost.** Three containers (chat + nats + sig-server)
|
|
+ bandwidth + a domain. Cheap at small scale, scales with users.
|
|
Need a "running kez.lat for 1k users — what does it cost?" answer
|
|
before community adoption.
|
|
|
|
---
|
|
|
|
## 11. The plan, sequenced
|
|
|
|
When we start building:
|
|
|
|
1. **Refactor: move `kez-core` + `kez-channels` to `rust-lib/`.**
|
|
Small but unblocks clean imports from kez-chat.
|
|
|
|
2. **Scaffold `kez-chat-server`** (axum + tokio + sqlite + tracing).
|
|
Handle registry + WebFinger first — these unblock client-side
|
|
account creation.
|
|
|
|
3. **NATS auth callout.** Bring up a NATS broker for development (see
|
|
Appendix A), configure its auth_callout to hit our chat-server's
|
|
`/internal/nats/auth`. End-to-end: a client can register a handle
|
|
and then connect to NATS authenticated by its KEZ key.
|
|
|
|
4. **Minimal `kez-chat-cli` client** (separate project) that does:
|
|
- `kez-chat register tudisco`
|
|
- `kez-chat add @chris`
|
|
- `kez-chat send @chris "hello"`
|
|
- `kez-chat listen`
|
|
No UI. Enough to prove the chat flow works end-to-end against
|
|
the server.
|
|
|
|
5. **Iroh integration in the client** (not the server).
|
|
- Client runs a local Iroh node
|
|
- `kez-chat share @chris ./file.pdf`
|
|
- `kez-chat fetch <ticket>`
|
|
|
|
6. **Shared-files manifest.** New `set_shared_files` sigchain op.
|
|
`kez-chat browse @tudisco` lists his shared files.
|
|
|
|
7. **Deployment recipe.** docker-compose, systemd, deployment doc.
|
|
|
|
8. **Then** start the GUI app. Could be Tauri (Rust + web frontend),
|
|
Iced (pure Rust UI), or something else.
|
|
|
|
---
|
|
|
|
## 12. One-paragraph summary
|
|
|
|
`kez-chat` is a Keybase-class chat and file-sharing app built on the
|
|
KEZ identity stack. Users get `@username@kez.lat` handles backed by an
|
|
ed25519 primary key. The same key authenticates to a NATS broker
|
|
(chat, presence, file tickets — broker is dumb, clients do E2E with
|
|
ChaCha20-Poly1305 over X25519-derived keys) and identifies an Iroh
|
|
node (P2P bulk transfer, content-addressed blobs, on-demand fetch).
|
|
**Our project ships two services**: a thin Rust `kez-chat-server`
|
|
that handles the handle registry + NATS auth callout + HTTP API, and
|
|
the existing `kez-sig-server` that stores sigchains. **NATS is
|
|
external infrastructure the operator provides** — we never ship,
|
|
embed, or supervise it. The chat-server does not run an Iroh node
|
|
and does not pin files in v0; file transfer is pure P2P between
|
|
online peers. Account recovery is via a 24-word paper-backup
|
|
mnemonic. Federation across home servers is deferred but the design
|
|
keeps it as a flip-the-switch future change.
|
|
|
|
---
|
|
|
|
## Appendix A: running a NATS broker locally for development
|
|
|
|
NATS is not part of our project, but you need one running to test the
|
|
chat-server end-to-end. Easiest path during development:
|
|
|
|
```sh
|
|
docker run -d --name kez-dev-nats \
|
|
-p 4222:4222 -p 8222:8222 \
|
|
-v "$PWD/dev-nats.conf:/etc/nats/nats.conf:ro" \
|
|
nats:latest -c /etc/nats/nats.conf --jetstream
|
|
```
|
|
|
|
Where `dev-nats.conf` enables the auth callout pointing at your
|
|
locally-running chat-server (e.g. `http://host.docker.internal:8080/internal/nats/auth`).
|
|
|
|
A full reference `dev-nats.conf` will live at `deploy/dev-nats.conf`
|
|
when we start building. This appendix exists so developers have a
|
|
one-liner to spin up NATS for testing; **it is not the production
|
|
deployment recipe** (operators run their own NATS however they want).
|
|
|
|
For production: see the NATS docs (https://docs.nats.io). Our project
|
|
has no opinion beyond "must be 2.10+ with JetStream + auth_callout
|
|
configured to hit our endpoint."
|