Kez/kez-chat/TODO.md
Jason Tudisco c8370ffdf0 feat(kez-chat): three days of security + UX + protocol work
A multi-day batch covering: a security review punch list (Day 1-3),
the visual-encryption profile-picture feature, a fast deploy
infrastructure, ack resilience + UX polish, several nostr ecosystem
alignment changes, and a key server-independence fix.

Security review (Day 1)
  - Envelope v2 (ephemeral x25519 per message; AAD-bound AES-GCM;
    no plaintext from/to). Big metadata-leak fix flagged by reviews.
    Forward secrecy at the per-message level; v1 decrypt kept for
    a one-week migration window.
  - Routing tag rename h → q (NIP-29 collision fix).
  - Web Push payload now empty (was leaking recipient handle to FCM).
  - Log demotion + process-instance salt-hash of handles so debug
    logs don't permanently encode the social graph.

Security review (Day 2)
  - Replay protection: SEEN_CAP 500 → 10_000 ids; reject events with
    impossibly old/future created_at; openMessage enforces ±7d/+5min
    freshness on plaintext sent_at.
  - Reveal Recovery Phrase now requires a fresh passphrase prompt.
    Verifies via the same unlockIdentity that the initial unlock
    uses. Bonus: works for biometric-only sessions (recovers the
    phrase that wasn't in memory before).
  - POST /v1/messages per-IP rate limit (60/min, capacity 60, with
    a periodic idle-bucket sweep). New rate_limit.rs module + tests.

Security review (Day 3 Option A)
  - Unforgeable acks: kind-4244 events now carry a `kez-sig` tag,
    the recipient's ed25519 signature over the acked event id.
    Sender verifies against the conversation peer's KEZ primary.
    Unsigned acks still accepted during the migration window.
  - Default `since=` lookback shortened 7d → 48h (matches relay
    retention).
  - Bounded concurrency on push fanout: tokio::sync::Semaphore(32).

Security review (Day 3 Option B — nostr ecosystem alignment)
  - NIP-65 (kind:10002): publish our relay list so other clients
    can discover where to find us.
  - NIP-42 AUTH: attachSigner / detachSigner wires the user's seed
    into the relay pool so AUTH-gated relays (damus.io DMs) deliver.
  - Minimal kind:0 baseline on first session unlock (unblocks
    writes on relays that reject unknown pubkeys).
  - Acks now include ["p", senderNostrPubkey] for NIP-25 / NIP-10
    routing convention.

Web Push end-to-end
  - Server-side nostr listener (nostr_listener.rs): the chat-server
    subscribes to relays for every registered handle's addr, so
    Web Push fires even when chat goes over nostr (the live default).
  - Push fanout from messages.rs spawned with bounded concurrency.
  - Empty payload (no recipient handle leaked to FCM).
  - Self-heal endpoint GET /v1/push/subscriptions/:handle —
    auto-re-registers a subscription if server lost it.
  - Auto-enable push on first unlock (was opt-in toggle hunt).
  - In-chat nudge banner + iOS PWA install hint.

Persistent sessions
  - persistent-session.ts: AES-GCM encrypt seed under a
    non-extractable IDB key, 30-day sliding-window TTL, restore on
    every boot.
  - Auto-fetch own kind:0 from nostr on a fresh device so the user
    sees their own avatar without re-setting it.

Profile pictures + visual encryption
  - Avatar component accepts a `picture` prop (data URL); falls
    back to the deterministic identicon when absent.
  - profile-store.ts: pick → resize to 256×256 JPEG → save locally
    + publish as NIP-01 kind:0.
  - Visual encryption (visual-crypto.ts): keyed Fisher-Yates pixel
    permutation + xoshiro256** PRNG. Output is a valid PNG with
    scrambled content. Salt embedded as a #kez-visual-v1:<hex>
    URL fragment.
  - Default ON for new pictures. Strangers see colored noise on
    public nostr; contacts see the real face.
  - Per-recipient AES wraps embedded in kind:0 content
    (kez_visual_keys map). The picture's symmetric key is wrapped
    via the same SealedEnvelope crypto our DMs use.
  - Self-wrap (sender wraps to their own primary too) so a fresh
    device of the same user can descramble its own picture.
  - Stranger-view preview thumbnail in Settings (the badge tucked
    into the avatar's bottom-right corner — "this is what the
    world sees").
  - Tap-to-zoom: header avatar in a thread opens a fullscreen
    overlay.

Peer-profile resolution
  - peer-profile-store.ts: IDB-cached one-shot kind:0 fetch +
    descramble.
  - peer-profile-cell.svelte.ts: reactive mirror for UI.
  - 6h bulk-scan staleness + force-refresh on thread open.
  - Avatar usages in Messages.svelte pass peer.picture through.

Local-echo + delivery receipts
  - Outbound messages render instantly with status="sending"; flip
    to "sent" when ≥1 relay accepts; "delivered" (check-in-circle)
    when recipient's client publishes an ack.
  - SVG status icons inside the bubble; "via X" footer on outbound.
  - Persistent pending-ack queue with retry on next session start.
  - Catch-up scan (fetchAcksForEventIds) self-heals delivered
    state on conversation open.
  - markDeliveredByEventId verifies the ack signature.

Active-relay tracking + reply preference
  - SimplePool.trackRelays = true. Capture first-to-accept on send
    via Promise.any over per-relay publish promises.
  - InboxMessage.via_relay set from pool.seenOn on receive.
  - Conversation.peer_via_relay persisted on every inbound DM.
  - sendMessage takes `preferRelay` and orders publish targets
    accordingly. Acks bias the same way.
  - "via relay.X" footer renders on outbound bubbles.

Conversation list polish
  - Per-conversation `unread_count` on the Conversation type.
  - Bumped on every genuinely-new inbound; reset on thread open.
  - Accent-color pill badge in the sidebar (rounds at "99+").

Server-independence fix
  - sendMessage skips the /v1/u/:handle lookup when the caller
    passes recipientPrimary (which Messages.svelte does, from the
    cached peer_primary on the conversation row). Chat over nostr
    no longer breaks when the chat-server is down — only brand-new
    conversations still need the directory lookup.

Relay set
  - Added wss://relay.snort.social and wss://nostr.wine to the
    default pool (was 3, now 5).

Fast-deploy infrastructure (new in this batch)
  - Dockerfile gains an `export` scratch stage (extracts binary +
    web/dist only).
  - Dockerfile.runtime: tiny runtime image that COPYs prebuilt
    artifacts — no rust/npm on the remote.
  - docker-compose.fast.yml: compose override pointing chat-server
    build at Dockerfile.runtime.
  - .dockerignore: excludes target/, node_modules/, prebuilt/,
    .buildx-cache/, .git, *.db. Critical: without this, an earlier
    bug had the buildx cache nested under the build context and
    blew up to 17GB by feeding itself into itself.
  - Old: ~10 min remote build. New: 3–5 min local + 5s remote
    runtime swap. Cache lives at ~/.cache/kez-chat-buildx
    (outside any project tree).

UI polish (margins, layout, banners)
  - Authenticated routes (Welcome / Settings / Identity / Dashboard
    / Claims / AddClaim) wrapped in max-w-2xl mx-auto px-4 py-6.
  - WhatsApp-style chat bubbles: shrink-wrap to content, asymmetric
    rounded corners, inline bottom-right timestamp.
  - Push-notification nudge banner at top of /chats with iOS
    install hint.
  - Relay state popover off the "● live (N)" indicator.

WebAuthn biometric fix
  - user.id now uses the raw 32-byte ed25519 pubkey (was the
    72-byte "ed25519:<hex>" identity string, which exceeded
    WebAuthn's 64-byte limit — Android Chrome rejected it with
    "user handle exceeds 64 bytes").

Documentation
  - kez-chat/TODO.md tracks every reviewer finding with status,
    file:line references, and a phased plan. All Day 1-3 items
    marked DONE; remaining roadmap items (Double Ratchet,
    WebAuthn-gated rehydrate, addr rotation, NIP-65 peer-relay
    fetch on send) documented for future sprints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-08 05:11:24 -06:00

337 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# kez-chat security + protocol TODO
Consolidated from a nostr-protocol expert review and an independent security
audit (both run 2026-06-08). Ordered by impact-per-hour-of-work, not by
review severity — a CRIT that's a half-week of design discussion isn't a
ship-stopper when there are real CRITs that take 30 minutes.
Update the status column as we land things. Cross-links to the original
review reports are at the bottom.
## Status legend
- `TODO` — not started
- `WIP` — actively being worked
- `DONE` — landed in main
- `ROADMAP` — committed but multi-day; will need its own design doc
- `WONTFIX` — accepted trade-off, documented
---
## Day 1 — ~half a day of work, biggest wins
### #1. Strip envelope metadata (ephemeral x25519 per message) [TODO]
**Why it matters:** the `SealedEnvelope.from` (KEZ identity) and `to` (handle)
fields sit in cleartext alongside the ciphertext in `event.content`. Any
nostr relay can JSON-parse the content and build a perfect social graph:
who-messages-whom, when, how often. The actual message body stays encrypted;
all the metadata is wide open.
**Files:**
- `kez-chat/web/src/lib/crypto.ts:42-54` (the `SealedEnvelope` shape)
- `kez-chat/web/src/lib/crypto.ts:115-153` (`sealMessage` / `openMessage`)
**Fix:** replace `envelope.from` (the KEZ identity) with `envelope.eph_pub`
(a 32-byte x25519 public key sender generated for this message only).
Recipient does ECDH against `eph_pub` instead of deriving it from the KEZ
identity. The decrypted plaintext already carries `from` for identity
verification.
Bonus: a fresh ephemeral key per message gives partial forward secrecy —
compromise of the recipient's long-term seed still decrypts retained
ciphertexts, but a captured single-message ECDH key reveals only that one
message.
**Migration:** envelope `v: 1 → v: 2`. Recipient accepts both during a
1-week window, then drops v1 support.
### #3. Empty Web Push payload [TODO]
**Why it matters:** we send `{type, to: <handle>, seq}` to FCM/APNs/Mozilla
on every fanout. RFC 8291 encrypts the payload so the push provider can't
read the bytes, but the provider already knows the endpoint's owner — the
`to` field adds no information for the recipient and *does* give Google a
clear "message arrived for alice at T" timeline.
**Files:**
- `kez-chat/src/messages.rs:123-127` (the payload we hand to `push.fanout`)
- `kez-chat/web/src/sw.ts:78-99` (the `push` handler)
**Fix:** send `{}` as the payload. The service worker shows a generic
"New kez-chat message" notification — it can still focus an existing tab,
which navigates to the conversation list. Deep-linking to a specific peer
goes away on cold-open (acceptable trade-off — one extra tap to open the
right thread is the price of *not* exporting metadata to FCM).
### #5. Rename the routing tag from `h` to something less claimed [TODO]
**Why it matters:** `h` is informally used by NIP-29 (Simple Groups) as the
group id. Today's three relays don't enforce NIP-29 semantics, but the
moment a NIP-29-aware relay enters our pool it will try to route our `#h`
filter as a group join, and we'll get cryptic failures.
**Files:**
- `kez-chat/web/src/lib/nostr-id.ts:28` (`ADDR_TAG = "h"`)
- `kez-chat/web/src/lib/nostr-transport.ts:201, 269` (publish + subscribe)
- `kez-chat/src/nostr_listener.rs:113` (server-side mirror)
**Fix:** switch to `q` (less-claimed single letter, still indexable per
NIP-01). Bump envelope/event `v` so the listener can tell old-tag events
from new ones during the migration window. Server-side listener subscribes
to BOTH `#h` and `#q` for one week.
### #17. Demote handle-revealing logs to `debug!` [TODO]
**Why it matters:** every fanout currently logs `push: fanout triggered
handle=<plain handle> sub_count=N` at INFO level. Operator-side log
retention turns this into a permanent "who's chatting" ledger. Even if
we trust the operator (it's us), forensics on a stolen log file leaks the
social graph in plaintext.
**Files:**
- `kez-chat/src/push.rs:259-262, 275-281` (fanout + send logging)
- `kez-chat/src/api.rs:387-393` (subscribe registration)
**Fix:** demote the handle-bearing INFO lines to DEBUG. Replace the visible
field with a short HMAC of the handle under a server-instance secret so we
can still group "all sends for X" in logs without exposing X. Set log level
in production to INFO, so DEBUG lines are off by default.
---
## Day 2 — another half-day
### #2. Replay protection — bound + timestamp freshness [DONE]
**Why it matters:** `SEEN_CAP=500` evicts oldest event ids once we've seen
500 messages. An active user rolls past that in days, then a malicious
relay can re-broadcast any old event and we accept it as a fresh message —
the decrypted `sent_at` is never compared to wall-clock.
**Files:**
- `kez-chat/web/src/lib/nostr-transport.ts:107` (`SEEN_CAP = 500`)
- `kez-chat/web/src/lib/nostr-transport.ts:142` (`slice(-SEEN_CAP)`)
- `kez-chat/web/src/lib/crypto.ts:161-205` (`openMessage` — no freshness check)
**Fix:**
1. Bump `SEEN_CAP` to 10_000 and move from localStorage to IndexedDB so the
set isn't capped by the 5MB localStorage quota.
2. In `openMessage`, reject envelopes where `|now sent_at| > 7 days`.
3. Also clamp `ev.created_at` to `[now 7d, now + 5min]` before using it
as a seq generator — otherwise a relay can backdate or future-date
events and either replay or skip-ahead `bumpSince`.
### #4. Reveal-recovery-phrase requires fresh auth [DONE]
**Why it matters:** 30 seconds of access to an unlocked phone = full
identity exfil. The Settings → Reveal Phrase button decrypts straight from
the persistent-session blob with no re-prompt.
**Files:**
- `kez-chat/web/src/routes/Settings.svelte` (the Reveal flow)
**Fix:** gate Reveal Phrase + Lock + biometric setup behind a fresh
passphrase prompt OR a WebAuthn assertion. Same model Apple/1Password use:
"this action requires your password again".
### #15. Rate-limit `POST /v1/messages` [DONE]
**Why it matters:** the endpoint currently accepts anonymous posts (no
auth on send) capped at 256KB per envelope. A bot can fill any mailbox
until disk fills. Acknowledged in `messages.rs:18-20` ("Spam: v0.1 doesn't
gate POST").
**Files:**
- `kez-chat/src/messages.rs:70-133`
**Fix (v0.1):** per-IP token bucket — 60 messages/min per source IP. Drop
overflow with 429.
**Fix (v0.2):** require the sender to sign with their KEZ primary; chat-
server verifies. Becomes useless for cross-server v0.2 unless the sender's
server vouches.
---
## Roadmap — multi-day, needs design pass
### #6. Forward secrecy (Double Ratchet) [ROADMAP]
**Why it matters:** today's static-static x25519 means whoever compromises
a seed once decrypts ALL retained history that any relay still has — and
relays retain indefinitely. The ephemeral-x25519 fix in #1 is partial
forward secrecy (per-message) but not post-compromise security.
**What's needed:** Signal-style X3DH + Double Ratchet. Significant
refactor of crypto.ts; needs careful API design so existing conversations
migrate cleanly. Owner: TBD. ETA: separate sprint.
### #7. WebAuthn-gated session rehydrate [ROADMAP]
**Why it matters:** the persistent-session blob's non-extractable AES key
blocks `exportKey` but NOT `decrypt`-then-read-plaintext. Any malicious
extension with `<all_urls>`, any XSS, any compromised npm dep can call
`restoreSession()` and lift the seed. My comment in
`persistent-session.ts:18-23` overstates the actual protection.
**Fix:** gate `restoreSession()` on a user-gesture WebAuthn assertion
(touchID / passkey). Background scripts can't fake a user gesture, so the
seed never gets decrypted unattended. Falls back to passphrase on devices
without WebAuthn.
### #8. Rotate addr daily (`info = "v1|YYYYMMDD"`) [ROADMAP]
**Why it matters:** a relay scrapes `#h` filter values + the public KEZ
directory + builds a rainbow table mapping `addr → primary → handle`. The
hash buys little when the input space is enumerable. Per-day addr
rotation forces the rainbow table to be rebuilt daily and stops long-term
correlation.
**Trade-off:** receivers need to subscribe to multiple addrs during the
boundary day (yesterday's + today's). Listener server-side needs the same.
Migration logic isn't hard but isn't free.
### #9. Unforgeable delivery acks [DONE — Day 3 Option A]
**Why it matters:** anyone who saw an event id can publish a fake kind-4244
ack. Sender's UI shows false "delivered". Cosmetic-only today; will be a
real problem when someone builds a tracker bot.
**Fix:** ack payload = recipient's ed25519 signature over the acked event
id. Sender verifies against the recipient's known KEZ primary. Free —
already have ed25519 plumbing.
### #10. NIP-65 outbox model [PARTIAL — Day 3 Option B]
Publish-side only. We now emit a `kind:10002` event on first session
alongside the kind:0 baseline, listing our 3 default relays as
read+write. NIP-65-aware clients can discover where to reach us.
What's still missing: when SENDING to a peer, we should fetch their
`kind:10002` and union their read-relays with ours. v0.2 — needs a
deeper transport refactor (per-message relay set).
We hardcode 3 relays for every user. Real nostr clients publish
`kind:10002` listing their preferred read+write relays; senders publish to
each recipient's published read-relays. Without this, isolated networks
of users on different relay sets can't reach each other.
### #11. NIP-42 AUTH support [DONE — Day 3 Option B]
damus.io regularly requires NIP-42 AUTH for DM-kind reads. Without it our
subscriptions get rejected silently. Add the client AUTH handshake +
support being prompted by the relay.
### #12. Publish a minimal kind-0 profile on first use [DONE — Day 3 Option B]
Some relays silently drop writes from "unknown" pubkeys (no kind-0). A
single minimal `kind:0` per derived nostr pubkey (just `{"name":"kez-chat
user"}`) unblocks this without revealing anything.
### #13. NIP-25 ack shape with `["p", senderNostrPubkey]` [DONE — Day 3 Option B]
Our kind-4244 ack is custom. Adopting the NIP-25 shape gets free interop
with nostr clients that already render reactions — handy if we ever expose
the underlying events.
### #14. Shorten `since=` default cursor [DONE — Day 3 Option A]
Default 7-day cursor exceeds most relay retention windows (often 13
days). Fresh devices on quiet conversations silently miss messages.
Shorten to 48h + augment with explicit "fetch full history" UI for the
rare resurrect case.
### #16. Bounded concurrency on push fanout [DONE — Day 3 Option A]
**Why it matters:** every send spawns an unbounded `tokio::spawn` to fan
out push. Under flood, OOM.
**Files:**
- `kez-chat/src/messages.rs:128`
**Fix:** semaphore-bound to ~32 concurrent fanouts. Excess queues; under
extreme flood we drop with a warn-log rather than swap-thrash.
---
## Visually-encrypted profile pictures (new feature, in progress)
### Phase 1A — local scramble + per-contact key wrap [DONE this commit]
- `kez-chat/web/src/lib/visual-crypto.ts` — keyed Fisher-Yates pixel
permutation + xoshiro256** PRNG. Output is a valid PNG with same
dimensions, scrambled content. Salt embedded as `#kez-visual-v1:<hex>`
URL fragment so descramble doesn't need out-of-band metadata.
- `profile-store.ts` — profile gains `encrypted: boolean` (default
true) + `picture_key` (local-only). On publish: scramble the picture,
wrap the visual key for each contact via the existing
`sealMessage()` envelope, embed as `kez_visual_keys` map in the
kind:0 content.
- Settings — "Visually encrypt picture (recommended)" toggle, default ON.
### Phase 1B — peer descramble [DONE this commit]
- `peer-profile-store.ts` (new): IDB cache + one-shot `pool.querySync`
fetch of the peer's kind:0 metadata event. On hit, looks up our
primary in `kez_visual_keys`, opens the SealedEnvelope wrap to
recover the visual key, descrambles `metadata.picture`, caches the
rendered data URL.
- `peer-profile-cell.svelte.ts` (new): reactive Svelte 5 mirror over
the IDB cache so component re-renders are automatic on fetch.
- `nostr-transport.ts`: surfaces `sender_nostr_pubkey` on every
inbound DM. `conversations-store.ts` persists it on the conversation
row so we can locate the peer's kind:0 later.
- `inbox-service.svelte.ts`: on every fresh DM, fires off a profile
fetch for the sender — first DM lights up their avatar.
- `Messages.svelte`: hydrates the cache on mount, kicks off refreshes
for every visible conversation, threads cached pictures through
both Avatar usages (conversation list + thread header).
- Conversation list re-renders on cache update; staleness window 24h.
Edges noted for later: peers we've only *sent* to (never received from)
have no `peer_nostr_pubkey` until they reply, so they don't get a
picture lookup yet. Easy follow-up: backfill pubkey from a NIP-05 or
WebFinger lookup, or proactively probe relays for `kind:0` events whose
content tags match a known primary.
### Phase 1C — UX polish [TODO]
- "X contacts can see your real picture" hint in Settings.
- Re-publish kind:0 automatically when a new conversation is created
(so the new contact gets key-wrapped without the user re-saving).
- Optional: per-image AES-CTR mode for uniform-noise output (stronger,
less "visually meaningful").
---
## Acknowledged trade-offs (won't fix in v0.1)
### Persistent-session is no stronger than the biometric path
The `non-extractable AES key` story stops `exportKey`, NOT
`decrypt`+`read`. Anyone with origin-execution access (XSS, malicious
extension) can lift the seed. Document this honestly in the README and the
file header. Real fix is #7 above.
### 30-day TTL is client-only
`expiresAt` in localStorage is editable by anyone with file-system access.
Server-side device binding (issue a signed nonce on unlock, expire at the
server) would help but adds round-trips. v0.2 candidate.
### Identity-key reuse is safe under current crypto
ed25519 seed → ed25519 (sigchain, envelope sig) + x25519 (ECDH) + HKDF →
secp256k1 (nostr signer). The auditor confirmed: no cross-curve
chosen-message attack path. Standard libsodium pattern.
---
## Tracking + cross-references
- Nostr-protocol review: see commit message of this commit; full report in
the audit-trail.
- Security audit: ditto.
- Owner: tudisco
- Last updated: 2026-06-08