plan(kez-chat): NATS is external infrastructure, not part of our stack

Sharpen the framing: our project doesn't ship, embed, supervise, or
even sit-next-to NATS. NATS is external infrastructure the operator
provides (their own server, Synadia Cloud, whatever) and we connect
to it the way an app connects to a database.

Changes:

- §4.2 process model: redraw the diagram showing NATS *outside* our
  deployment boundary (with a dashed line for "external"), our two
  services on one side, chat-server reaches out to the operator's
  NATS via the auth callout.

- §4.3 docker-compose sketch: remove the nats container entirely.
  Our compose ships chat-server + sig-server only. NATS_URL is an
  environment variable the operator sets. We document the nats.conf
  snippet the operator needs to add to their own NATS deployment.

- §6.4 NATS broker section rewritten as "external dependency" — what
  we require from the operator's NATS (version, JetStream, callout
  config), and why we don't bundle it (NATS is its own ops problem;
  operators may already have one; we shouldn't lock them in).

- §11 sequenced plan step 3: developers spin up a local NATS for
  testing via Appendix A, not "run nats-server in a sibling container."

- Decisions-locked row for NATS now explicit: "We don't ship, embed,
  or supervise it. We connect to whatever broker NATS_URL points at."

- New Appendix A: "running a NATS broker locally for development" —
  one-liner docker run for testing, with explicit "this is dev only,
  not the production deployment recipe."

- §12 one-paragraph summary updated to reflect "our project ships two
  services" (chat-server + sig-server), NATS is external.
This commit is contained in:
Tudisco 2026-05-24 22:40:15 -06:00
parent f586129787
commit f0aa86f71a

View File

@ -194,62 +194,65 @@ becomes a fourth optional container.
### 4.2 Process / deployment model ### 4.2 Process / deployment model
NATS is **not part of our deployment.** The operator runs NATS however
they want (Synadia Cloud, their own cluster, a friend's broker, a single
local container) and gives the chat-server a URL. Same idea as a
database: we connect to one; we don't ship one.
``` ```
┌──────────────────────────────────────────────────────────────┐ External infrastructure
│ docker-compose / systemd / Kubernetes │ (operator's responsibility)
┌──────────────────────┐
│ NATS broker │
│ + JetStream │
│ somewhere │
└─────────▲─────▲──────┘
│ │
chat-server ──────┘ │ ◄────── client app
(auth callout) │ (publish/subscribe)
┌─────────────── our deployment ─────────────────┐
│ │ │ │
│ ┌──────────────┐ ┌─────────────────┐ ┌────────────────┐ │ │ ┌─────────────────┐ ┌────────────────┐ │
│ │ nats-server │ │ kez-chat-server │ │ kez-sig-server │ │ │ │ kez-chat-server │ │ kez-sig-server │ │
│ │ (Go) │◄──┤ (Rust) ├──►│ (Rust) │ │ │ │ (Rust) │ │ (Rust) │ │
│ │ + JetStream │ │ │ │ (existing) │ │ │ │ │ │ │ │
│ │ │ │ ↓ handles │ │ ↓ sigchain │ │ │ │ ↓ handles │ │ ↓ sigchain │ │
│ │ │ │ ↓ nats auth │ │ storage │ │ │ │ ↓ nats auth │ │ storage │ │
│ │ │ │ ↓ HTTP API │ │ │ │ │ │ ↓ HTTP API │ │ │ │
│ └──────────────┘ └─────────────────┘ └────────────────┘ │ │ └─────────────────┘ └────────────────┘ │
│ ▲ ▲ ▲ │ │ ▲ ▲ │
│ │ │ │ │ └─────────┼──────────────────────┼───────────────┘
└─────────┼───────────────────┼──────────────────────┼─────────┘ │ │
│ │ │ ┌──────┴──────────────────────┴────────────────────────┐
│ │ │
┌──────┴───────────────────┴──────────────────────┴─────┐
│ Chat app (per user, runs on phone/desktop) │ │ Chat app (per user, runs on phone/desktop) │
│ │ │ │
│ • talks to nats-server over native NATS protocol │ • talks to the operator's NATS broker (NATS proto)
│ • talks to kez-chat-server over HTTPS (handles, etc.) │ • talks to kez-chat-server over HTTPS
│ • talks to kez-sig-server over HTTPS (sigchain) │ • talks to kez-sig-server over HTTPS
│ • runs local iroh::Node for file send/receive │ │ • runs local iroh::Node for file send/receive │
└──────────────────────────────────────────────────────── └──────────────────────────────────────────────────────┘
``` ```
The Rust chat-server orchestrates auth between NATS and the handle The chat-server orchestrates auth against whatever NATS broker is
registry, but doesn't host either NATS or the sigchains. configured, but doesn't run, host, supervise, or ship NATS in any form.
### 4.3 docker-compose sketch ### 4.3 docker-compose sketch (our two services only)
```yaml ```yaml
# deploy/docker-compose.yml # deploy/docker-compose.yml — what we ship
services: services:
nats:
image: nats:latest
command: ["-c", "/etc/nats/nats.conf", "--jetstream"]
volumes:
- ./nats.conf:/etc/nats/nats.conf:ro
- nats-data:/data
ports:
- "4222:4222" # client connections (TLS in prod)
- "8222:8222" # monitoring
chat-server: chat-server:
build: . # kez-chat-server Rust binary build: . # kez-chat-server Rust binary
environment: environment:
NATS_URL: nats://nats:4222 NATS_URL: ${NATS_URL} # operator points us at their NATS broker
SIG_SERVER_URL: http://sig-server:7878 SIG_SERVER_URL: http://sig-server:7878
DB_PATH: /data/handles.db DB_PATH: /data/handles.db
AUTH_CALLOUT_NKEY_PATH: /etc/kez/auth-callout.nkey AUTH_CALLOUT_NKEY_PATH: /etc/kez/auth-callout.nkey
volumes: volumes:
- chat-data:/data - chat-data:/data
- ./auth-callout.nkey:/etc/kez/auth-callout.nkey:ro - ./auth-callout.nkey:/etc/kez/auth-callout.nkey:ro
depends_on: [nats, sig-server] depends_on: [sig-server]
ports: ports:
- "8080:8080" # HTTP API for clients - "8080:8080" # HTTP API for clients
@ -263,15 +266,37 @@ services:
- "7878:7878" - "7878:7878"
volumes: volumes:
nats-data:
chat-data: chat-data:
sig-data: sig-data:
``` ```
NATS's auth-callout is configured in `nats.conf` to send connection **NATS is not in this file.** The operator brings their own — running
requests to `chat-server:8080/internal/nats/auth`. The chat-server on a different host, in a different compose project, on Synadia Cloud,
verifies the nkey signature against the handle registry and returns or wherever. They give us `NATS_URL` and a place to put our auth
allowed subjects (typically just the user's own inbox). callout endpoint URL in their `nats.conf`.
What the operator needs to add on the NATS side (in **their** config):
```conf
# nats.conf — added to whatever NATS deployment the operator runs
authorization {
auth_callout {
issuer: "<our auth-callout signing nkey public part>"
auth_users: ["AUTHUSER"] # placeholder identity NATS uses
account: "DEFAULT"
}
}
```
The chat-server signs auth-callout responses with a long-lived nkey
that NATS trusts. When a client connects to NATS with their KEZ
ed25519 key, NATS forwards the auth request to our chat-server,
which checks the handle registry and signs a yes/no response.
We provide a reference `nats.conf` snippet in the docs. The operator
patches it into their own NATS deployment.
For local development, see Appendix A.
### 4.4 Endpoints ### 4.4 Endpoints
@ -494,19 +519,43 @@ import cleanly from the start.
- `iroh` — server doesn't run an Iroh node in v0 (no pinning) - `iroh` — server doesn't run an Iroh node in v0 (no pinning)
- nats-server (Go) — separate container, not a Rust dep - nats-server (Go) — separate container, not a Rust dep
### 6.4 NATS broker — separate container ### 6.4 NATS broker — external dependency
We don't write or embed a NATS broker. Run the official Go binary: NATS is **not part of our project**. It's external infrastructure the
operator provides, the same way they'd provide a database or an SMTP
relay. We ship:
- `nats-server` from nats.io - An `async-nats` client used by the chat-server (admin/utility work)
- JetStream enabled (for offline message buffering) - An auth-callout HTTP endpoint that NATS calls during client connection
- Auth callout configured to hit `chat-server:8080/internal/nats/auth` - A documented `nats.conf` snippet operators add to their NATS deployment
- Run as its own docker-compose service (see §4.3) - A reference local-dev setup (Appendix A) for running NATS yourself
while developing
Why not embed: NATS is Go; no production-grade Rust port. Docker-compose What we require from the operator's NATS:
keeps the deployment honest (each service in its own container, normal
operational tooling applies). One config change to swap broker | Requirement | Why |
implementations or run a cluster. |---|---|
| **NATS 2.10+** (for auth_callout) | We rely on auth callout to bridge KEZ identity into NATS |
| **JetStream enabled** | For offline message buffering (durable consumers) |
| **TCP reachable** from chat-server and clients | Standard |
| **TLS** (in production) | Standard |
| **auth_callout configured** to hit our endpoint | Required for client auth |
That's it. Operator can run a single Docker container, a clustered
production deployment, or a managed service — we don't care, as long
as `NATS_URL` and the callout config are correct.
Why fully external rather than alongside us:
- NATS is a serious piece of infrastructure with its own scaling and
operational concerns. Bundling it implies we're responsible for it.
We're not.
- Operators with existing NATS deployments can reuse them. No "now run
our copy of NATS too."
- Different teams might run different NATS topologies (single instance,
cluster, mesh, leaf nodes). None of that is our problem.
- Swapping NATS implementations or moving to a managed provider is a
config change, not a code change.
### 6.5 Iroh — client-side only ### 6.5 Iroh — client-side only
@ -597,7 +646,7 @@ Settle yes/no on this and the design is locked.
| Question | Decision | | Question | Decision |
|---|---| |---|---|
| Bundle sigchain in chat-server? | **No.** Use existing `kez-sig-server`. Microservices. | | Bundle sigchain in chat-server? | **No.** Use existing `kez-sig-server`. Microservices. |
| Bundle NATS into Rust server? | **No.** Run `nats-server` as a separate container; chat-server provides the auth callout. | | Bundle NATS into Rust server? | **No.** NATS is external infrastructure the operator provides. We don't ship, embed, or supervise it. We connect to whatever broker `NATS_URL` points at. |
| KEZ + nostr coexistence for chat? | **No nostr in chat.** KEZ is identity-only; nostr only as a verifiable claim in someone's sigchain, not as transport. | | KEZ + nostr coexistence for chat? | **No nostr in chat.** KEZ is identity-only; nostr only as a verifiable claim in someone's sigchain, not as transport. |
| Handle scope: federation or global? | **Global for v0**, federation-ready design (see §3.5). | | Handle scope: federation or global? | **Global for v0**, federation-ready design (see §3.5). |
| Recovery if key lost? | **Paper backup (24-word mnemonic), Keybase-style.** No server-side recovery. | | Recovery if key lost? | **Paper backup (24-word mnemonic), Keybase-style.** No server-side recovery. |
@ -652,10 +701,10 @@ When we start building:
Handle registry + WebFinger first — these unblock client-side Handle registry + WebFinger first — these unblock client-side
account creation. account creation.
3. **NATS auth callout.** Run nats-server in a sibling container with 3. **NATS auth callout.** Bring up a NATS broker for development (see
the callout configured to hit our endpoint. End-to-end: a client Appendix A), configure its auth_callout to hit our chat-server's
can register a handle and then connect to NATS authenticated by `/internal/nats/auth`. End-to-end: a client can register a handle
its KEZ key. and then connect to NATS authenticated by its KEZ key.
4. **Minimal `kez-chat-cli` client** (separate project) that does: 4. **Minimal `kez-chat-cli` client** (separate project) that does:
- `kez-chat register tudisco` - `kez-chat register tudisco`
@ -688,11 +737,38 @@ ed25519 primary key. The same key authenticates to a NATS broker
(chat, presence, file tickets — broker is dumb, clients do E2E with (chat, presence, file tickets — broker is dumb, clients do E2E with
ChaCha20-Poly1305 over X25519-derived keys) and identifies an Iroh ChaCha20-Poly1305 over X25519-derived keys) and identifies an Iroh
node (P2P bulk transfer, content-addressed blobs, on-demand fetch). node (P2P bulk transfer, content-addressed blobs, on-demand fetch).
The server side is a microservices deployment: a thin Rust **Our project ships two services**: a thin Rust `kez-chat-server`
`kez-chat-server` handles the handle registry + NATS auth + HTTP API; that handles the handle registry + NATS auth callout + HTTP API, and
a separate `nats-server` container runs the broker; the existing the existing `kez-sig-server` that stores sigchains. **NATS is
`kez-sig-server` stores sigchains. The chat-server does not run an external infrastructure the operator provides** — we never ship,
Iroh node and does not pin files in v0 — file transfer is pure P2P embed, or supervise it. The chat-server does not run an Iroh node
between online peers. Account recovery is via a 24-word paper-backup and does not pin files in v0; file transfer is pure P2P between
online peers. Account recovery is via a 24-word paper-backup
mnemonic. Federation across home servers is deferred but the design mnemonic. Federation across home servers is deferred but the design
keeps it as a flip-the-switch future change. keeps it as a flip-the-switch future change.
---
## Appendix A: running a NATS broker locally for development
NATS is not part of our project, but you need one running to test the
chat-server end-to-end. Easiest path during development:
```sh
docker run -d --name kez-dev-nats \
-p 4222:4222 -p 8222:8222 \
-v "$PWD/dev-nats.conf:/etc/nats/nats.conf:ro" \
nats:latest -c /etc/nats/nats.conf --jetstream
```
Where `dev-nats.conf` enables the auth callout pointing at your
locally-running chat-server (e.g. `http://host.docker.internal:8080/internal/nats/auth`).
A full reference `dev-nats.conf` will live at `deploy/dev-nats.conf`
when we start building. This appendix exists so developers have a
one-liner to spin up NATS for testing; **it is not the production
deployment recipe** (operators run their own NATS however they want).
For production: see the NATS docs (https://docs.nats.io). Our project
has no opinion beyond "must be 2.10+ with JetStream + auth_callout
configured to hit our endpoint."