Kez/kez-chat/TODO.md
Jason Tudisco c8370ffdf0 feat(kez-chat): three days of security + UX + protocol work
A multi-day batch covering: a security review punch list (Day 1-3),
the visual-encryption profile-picture feature, a fast deploy
infrastructure, ack resilience + UX polish, several nostr ecosystem
alignment changes, and a key server-independence fix.

Security review (Day 1)
  - Envelope v2 (ephemeral x25519 per message; AAD-bound AES-GCM;
    no plaintext from/to). Big metadata-leak fix flagged by reviews.
    Forward secrecy at the per-message level; v1 decrypt kept for
    a one-week migration window.
  - Routing tag rename h → q (NIP-29 collision fix).
  - Web Push payload now empty (was leaking recipient handle to FCM).
  - Log demotion + process-instance salt-hash of handles so debug
    logs don't permanently encode the social graph.

Security review (Day 2)
  - Replay protection: SEEN_CAP 500 → 10_000 ids; reject events with
    impossibly old/future created_at; openMessage enforces ±7d/+5min
    freshness on plaintext sent_at.
  - Reveal Recovery Phrase now requires a fresh passphrase prompt.
    Verifies via the same unlockIdentity that the initial unlock
    uses. Bonus: works for biometric-only sessions (recovers the
    phrase that wasn't in memory before).
  - POST /v1/messages per-IP rate limit (60/min, capacity 60, with
    a periodic idle-bucket sweep). New rate_limit.rs module + tests.

Security review (Day 3 Option A)
  - Unforgeable acks: kind-4244 events now carry a `kez-sig` tag,
    the recipient's ed25519 signature over the acked event id.
    Sender verifies against the conversation peer's KEZ primary.
    Unsigned acks still accepted during the migration window.
  - Default `since=` lookback shortened 7d → 48h (matches relay
    retention).
  - Bounded concurrency on push fanout: tokio::sync::Semaphore(32).

Security review (Day 3 Option B — nostr ecosystem alignment)
  - NIP-65 (kind:10002): publish our relay list so other clients
    can discover where to find us.
  - NIP-42 AUTH: attachSigner / detachSigner wires the user's seed
    into the relay pool so AUTH-gated relays (damus.io DMs) deliver.
  - Minimal kind:0 baseline on first session unlock (unblocks
    writes on relays that reject unknown pubkeys).
  - Acks now include ["p", senderNostrPubkey] for NIP-25 / NIP-10
    routing convention.

Web Push end-to-end
  - Server-side nostr listener (nostr_listener.rs): the chat-server
    subscribes to relays for every registered handle's addr, so
    Web Push fires even when chat goes over nostr (the live default).
  - Push fanout from messages.rs spawned with bounded concurrency.
  - Empty payload (no recipient handle leaked to FCM).
  - Self-heal endpoint GET /v1/push/subscriptions/:handle —
    auto-re-registers a subscription if server lost it.
  - Auto-enable push on first unlock (was opt-in toggle hunt).
  - In-chat nudge banner + iOS PWA install hint.

Persistent sessions
  - persistent-session.ts: AES-GCM encrypt seed under a
    non-extractable IDB key, 30-day sliding-window TTL, restore on
    every boot.
  - Auto-fetch own kind:0 from nostr on a fresh device so the user
    sees their own avatar without re-setting it.

Profile pictures + visual encryption
  - Avatar component accepts a `picture` prop (data URL); falls
    back to the deterministic identicon when absent.
  - profile-store.ts: pick → resize to 256×256 JPEG → save locally
    + publish as NIP-01 kind:0.
  - Visual encryption (visual-crypto.ts): keyed Fisher-Yates pixel
    permutation + xoshiro256** PRNG. Output is a valid PNG with
    scrambled content. Salt embedded as a #kez-visual-v1:<hex>
    URL fragment.
  - Default ON for new pictures. Strangers see colored noise on
    public nostr; contacts see the real face.
  - Per-recipient AES wraps embedded in kind:0 content
    (kez_visual_keys map). The picture's symmetric key is wrapped
    via the same SealedEnvelope crypto our DMs use.
  - Self-wrap (sender wraps to their own primary too) so a fresh
    device of the same user can descramble its own picture.
  - Stranger-view preview thumbnail in Settings (the badge tucked
    into the avatar's bottom-right corner — "this is what the
    world sees").
  - Tap-to-zoom: header avatar in a thread opens a fullscreen
    overlay.

Peer-profile resolution
  - peer-profile-store.ts: IDB-cached one-shot kind:0 fetch +
    descramble.
  - peer-profile-cell.svelte.ts: reactive mirror for UI.
  - 6h bulk-scan staleness + force-refresh on thread open.
  - Avatar usages in Messages.svelte pass peer.picture through.

Local-echo + delivery receipts
  - Outbound messages render instantly with status="sending"; flip
    to "sent" when ≥1 relay accepts; "delivered" (check-in-circle)
    when recipient's client publishes an ack.
  - SVG status icons inside the bubble; "via X" footer on outbound.
  - Persistent pending-ack queue with retry on next session start.
  - Catch-up scan (fetchAcksForEventIds) self-heals delivered
    state on conversation open.
  - markDeliveredByEventId verifies the ack signature.

Active-relay tracking + reply preference
  - SimplePool.trackRelays = true. Capture first-to-accept on send
    via Promise.any over per-relay publish promises.
  - InboxMessage.via_relay set from pool.seenOn on receive.
  - Conversation.peer_via_relay persisted on every inbound DM.
  - sendMessage takes `preferRelay` and orders publish targets
    accordingly. Acks bias the same way.
  - "via relay.X" footer renders on outbound bubbles.

Conversation list polish
  - Per-conversation `unread_count` on the Conversation type.
  - Bumped on every genuinely-new inbound; reset on thread open.
  - Accent-color pill badge in the sidebar (rounds at "99+").

Server-independence fix
  - sendMessage skips the /v1/u/:handle lookup when the caller
    passes recipientPrimary (which Messages.svelte does, from the
    cached peer_primary on the conversation row). Chat over nostr
    no longer breaks when the chat-server is down — only brand-new
    conversations still need the directory lookup.

Relay set
  - Added wss://relay.snort.social and wss://nostr.wine to the
    default pool (was 3, now 5).

Fast-deploy infrastructure (new in this batch)
  - Dockerfile gains an `export` scratch stage (extracts binary +
    web/dist only).
  - Dockerfile.runtime: tiny runtime image that COPYs prebuilt
    artifacts — no rust/npm on the remote.
  - docker-compose.fast.yml: compose override pointing chat-server
    build at Dockerfile.runtime.
  - .dockerignore: excludes target/, node_modules/, prebuilt/,
    .buildx-cache/, .git, *.db. Critical: without this, an earlier
    bug had the buildx cache nested under the build context and
    blew up to 17GB by feeding itself into itself.
  - Old: ~10 min remote build. New: 3–5 min local + 5s remote
    runtime swap. Cache lives at ~/.cache/kez-chat-buildx
    (outside any project tree).

UI polish (margins, layout, banners)
  - Authenticated routes (Welcome / Settings / Identity / Dashboard
    / Claims / AddClaim) wrapped in max-w-2xl mx-auto px-4 py-6.
  - WhatsApp-style chat bubbles: shrink-wrap to content, asymmetric
    rounded corners, inline bottom-right timestamp.
  - Push-notification nudge banner at top of /chats with iOS
    install hint.
  - Relay state popover off the "● live (N)" indicator.

WebAuthn biometric fix
  - user.id now uses the raw 32-byte ed25519 pubkey (was the
    72-byte "ed25519:<hex>" identity string, which exceeded
    WebAuthn's 64-byte limit — Android Chrome rejected it with
    "user handle exceeds 64 bytes").

Documentation
  - kez-chat/TODO.md tracks every reviewer finding with status,
    file:line references, and a phased plan. All Day 1-3 items
    marked DONE; remaining roadmap items (Double Ratchet,
    WebAuthn-gated rehydrate, addr rotation, NIP-65 peer-relay
    fetch on send) documented for future sprints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-08 05:11:24 -06:00

14 KiB
Raw Blame History

kez-chat security + protocol TODO

Consolidated from a nostr-protocol expert review and an independent security audit (both run 2026-06-08). Ordered by impact-per-hour-of-work, not by review severity — a CRIT that's a half-week of design discussion isn't a ship-stopper when there are real CRITs that take 30 minutes.

Update the status column as we land things. Cross-links to the original review reports are at the bottom.

Status legend

  • TODO — not started
  • WIP — actively being worked
  • DONE — landed in main
  • ROADMAP — committed but multi-day; will need its own design doc
  • WONTFIX — accepted trade-off, documented

Day 1 — ~half a day of work, biggest wins

#1. Strip envelope metadata (ephemeral x25519 per message) [TODO]

Why it matters: the SealedEnvelope.from (KEZ identity) and to (handle) fields sit in cleartext alongside the ciphertext in event.content. Any nostr relay can JSON-parse the content and build a perfect social graph: who-messages-whom, when, how often. The actual message body stays encrypted; all the metadata is wide open.

Files:

  • kez-chat/web/src/lib/crypto.ts:42-54 (the SealedEnvelope shape)
  • kez-chat/web/src/lib/crypto.ts:115-153 (sealMessage / openMessage)

Fix: replace envelope.from (the KEZ identity) with envelope.eph_pub (a 32-byte x25519 public key sender generated for this message only). Recipient does ECDH against eph_pub instead of deriving it from the KEZ identity. The decrypted plaintext already carries from for identity verification.

Bonus: a fresh ephemeral key per message gives partial forward secrecy — compromise of the recipient's long-term seed still decrypts retained ciphertexts, but a captured single-message ECDH key reveals only that one message.

Migration: envelope v: 1 → v: 2. Recipient accepts both during a 1-week window, then drops v1 support.

#3. Empty Web Push payload [TODO]

Why it matters: we send {type, to: <handle>, seq} to FCM/APNs/Mozilla on every fanout. RFC 8291 encrypts the payload so the push provider can't read the bytes, but the provider already knows the endpoint's owner — the to field adds no information for the recipient and does give Google a clear "message arrived for alice at T" timeline.

Files:

  • kez-chat/src/messages.rs:123-127 (the payload we hand to push.fanout)
  • kez-chat/web/src/sw.ts:78-99 (the push handler)

Fix: send {} as the payload. The service worker shows a generic "New kez-chat message" notification — it can still focus an existing tab, which navigates to the conversation list. Deep-linking to a specific peer goes away on cold-open (acceptable trade-off — one extra tap to open the right thread is the price of not exporting metadata to FCM).

#5. Rename the routing tag from h to something less claimed [TODO]

Why it matters: h is informally used by NIP-29 (Simple Groups) as the group id. Today's three relays don't enforce NIP-29 semantics, but the moment a NIP-29-aware relay enters our pool it will try to route our #h filter as a group join, and we'll get cryptic failures.

Files:

  • kez-chat/web/src/lib/nostr-id.ts:28 (ADDR_TAG = "h")
  • kez-chat/web/src/lib/nostr-transport.ts:201, 269 (publish + subscribe)
  • kez-chat/src/nostr_listener.rs:113 (server-side mirror)

Fix: switch to q (less-claimed single letter, still indexable per NIP-01). Bump envelope/event v so the listener can tell old-tag events from new ones during the migration window. Server-side listener subscribes to BOTH #h and #q for one week.

#17. Demote handle-revealing logs to debug! [TODO]

Why it matters: every fanout currently logs push: fanout triggered handle=<plain handle> sub_count=N at INFO level. Operator-side log retention turns this into a permanent "who's chatting" ledger. Even if we trust the operator (it's us), forensics on a stolen log file leaks the social graph in plaintext.

Files:

  • kez-chat/src/push.rs:259-262, 275-281 (fanout + send logging)
  • kez-chat/src/api.rs:387-393 (subscribe registration)

Fix: demote the handle-bearing INFO lines to DEBUG. Replace the visible field with a short HMAC of the handle under a server-instance secret so we can still group "all sends for X" in logs without exposing X. Set log level in production to INFO, so DEBUG lines are off by default.


Day 2 — another half-day

#2. Replay protection — bound + timestamp freshness [DONE]

Why it matters: SEEN_CAP=500 evicts oldest event ids once we've seen 500 messages. An active user rolls past that in days, then a malicious relay can re-broadcast any old event and we accept it as a fresh message — the decrypted sent_at is never compared to wall-clock.

Files:

  • kez-chat/web/src/lib/nostr-transport.ts:107 (SEEN_CAP = 500)
  • kez-chat/web/src/lib/nostr-transport.ts:142 (slice(-SEEN_CAP))
  • kez-chat/web/src/lib/crypto.ts:161-205 (openMessage — no freshness check)

Fix:

  1. Bump SEEN_CAP to 10_000 and move from localStorage to IndexedDB so the set isn't capped by the 5MB localStorage quota.
  2. In openMessage, reject envelopes where |now sent_at| > 7 days.
  3. Also clamp ev.created_at to [now 7d, now + 5min] before using it as a seq generator — otherwise a relay can backdate or future-date events and either replay or skip-ahead bumpSince.

#4. Reveal-recovery-phrase requires fresh auth [DONE]

Why it matters: 30 seconds of access to an unlocked phone = full identity exfil. The Settings → Reveal Phrase button decrypts straight from the persistent-session blob with no re-prompt.

Files:

  • kez-chat/web/src/routes/Settings.svelte (the Reveal flow)

Fix: gate Reveal Phrase + Lock + biometric setup behind a fresh passphrase prompt OR a WebAuthn assertion. Same model Apple/1Password use: "this action requires your password again".

#15. Rate-limit POST /v1/messages [DONE]

Why it matters: the endpoint currently accepts anonymous posts (no auth on send) capped at 256KB per envelope. A bot can fill any mailbox until disk fills. Acknowledged in messages.rs:18-20 ("Spam: v0.1 doesn't gate POST").

Files:

  • kez-chat/src/messages.rs:70-133

Fix (v0.1): per-IP token bucket — 60 messages/min per source IP. Drop overflow with 429. Fix (v0.2): require the sender to sign with their KEZ primary; chat- server verifies. Becomes useless for cross-server v0.2 unless the sender's server vouches.


Roadmap — multi-day, needs design pass

#6. Forward secrecy (Double Ratchet) [ROADMAP]

Why it matters: today's static-static x25519 means whoever compromises a seed once decrypts ALL retained history that any relay still has — and relays retain indefinitely. The ephemeral-x25519 fix in #1 is partial forward secrecy (per-message) but not post-compromise security.

What's needed: Signal-style X3DH + Double Ratchet. Significant refactor of crypto.ts; needs careful API design so existing conversations migrate cleanly. Owner: TBD. ETA: separate sprint.

#7. WebAuthn-gated session rehydrate [ROADMAP]

Why it matters: the persistent-session blob's non-extractable AES key blocks exportKey but NOT decrypt-then-read-plaintext. Any malicious extension with <all_urls>, any XSS, any compromised npm dep can call restoreSession() and lift the seed. My comment in persistent-session.ts:18-23 overstates the actual protection.

Fix: gate restoreSession() on a user-gesture WebAuthn assertion (touchID / passkey). Background scripts can't fake a user gesture, so the seed never gets decrypted unattended. Falls back to passphrase on devices without WebAuthn.

#8. Rotate addr daily (info = "v1|YYYYMMDD") [ROADMAP]

Why it matters: a relay scrapes #h filter values + the public KEZ directory + builds a rainbow table mapping addr → primary → handle. The hash buys little when the input space is enumerable. Per-day addr rotation forces the rainbow table to be rebuilt daily and stops long-term correlation.

Trade-off: receivers need to subscribe to multiple addrs during the boundary day (yesterday's + today's). Listener server-side needs the same. Migration logic isn't hard but isn't free.

#9. Unforgeable delivery acks [DONE — Day 3 Option A]

Why it matters: anyone who saw an event id can publish a fake kind-4244 ack. Sender's UI shows false "delivered". Cosmetic-only today; will be a real problem when someone builds a tracker bot.

Fix: ack payload = recipient's ed25519 signature over the acked event id. Sender verifies against the recipient's known KEZ primary. Free — already have ed25519 plumbing.

#10. NIP-65 outbox model [PARTIAL — Day 3 Option B]

Publish-side only. We now emit a kind:10002 event on first session alongside the kind:0 baseline, listing our 3 default relays as read+write. NIP-65-aware clients can discover where to reach us.

What's still missing: when SENDING to a peer, we should fetch their kind:10002 and union their read-relays with ours. v0.2 — needs a deeper transport refactor (per-message relay set).

We hardcode 3 relays for every user. Real nostr clients publish kind:10002 listing their preferred read+write relays; senders publish to each recipient's published read-relays. Without this, isolated networks of users on different relay sets can't reach each other.

#11. NIP-42 AUTH support [DONE — Day 3 Option B]

damus.io regularly requires NIP-42 AUTH for DM-kind reads. Without it our subscriptions get rejected silently. Add the client AUTH handshake + support being prompted by the relay.

#12. Publish a minimal kind-0 profile on first use [DONE — Day 3 Option B]

Some relays silently drop writes from "unknown" pubkeys (no kind-0). A single minimal kind:0 per derived nostr pubkey (just {"name":"kez-chat user"}) unblocks this without revealing anything.

#13. NIP-25 ack shape with ["p", senderNostrPubkey] [DONE — Day 3 Option B]

Our kind-4244 ack is custom. Adopting the NIP-25 shape gets free interop with nostr clients that already render reactions — handy if we ever expose the underlying events.

#14. Shorten since= default cursor [DONE — Day 3 Option A]

Default 7-day cursor exceeds most relay retention windows (often 13 days). Fresh devices on quiet conversations silently miss messages. Shorten to 48h + augment with explicit "fetch full history" UI for the rare resurrect case.

#16. Bounded concurrency on push fanout [DONE — Day 3 Option A]

Why it matters: every send spawns an unbounded tokio::spawn to fan out push. Under flood, OOM.

Files:

  • kez-chat/src/messages.rs:128

Fix: semaphore-bound to ~32 concurrent fanouts. Excess queues; under extreme flood we drop with a warn-log rather than swap-thrash.


Visually-encrypted profile pictures (new feature, in progress)

Phase 1A — local scramble + per-contact key wrap [DONE this commit]

  • kez-chat/web/src/lib/visual-crypto.ts — keyed Fisher-Yates pixel permutation + xoshiro256** PRNG. Output is a valid PNG with same dimensions, scrambled content. Salt embedded as #kez-visual-v1:<hex> URL fragment so descramble doesn't need out-of-band metadata.
  • profile-store.ts — profile gains encrypted: boolean (default true) + picture_key (local-only). On publish: scramble the picture, wrap the visual key for each contact via the existing sealMessage() envelope, embed as kez_visual_keys map in the kind:0 content.
  • Settings — "Visually encrypt picture (recommended)" toggle, default ON.

Phase 1B — peer descramble [DONE this commit]

  • peer-profile-store.ts (new): IDB cache + one-shot pool.querySync fetch of the peer's kind:0 metadata event. On hit, looks up our primary in kez_visual_keys, opens the SealedEnvelope wrap to recover the visual key, descrambles metadata.picture, caches the rendered data URL.
  • peer-profile-cell.svelte.ts (new): reactive Svelte 5 mirror over the IDB cache so component re-renders are automatic on fetch.
  • nostr-transport.ts: surfaces sender_nostr_pubkey on every inbound DM. conversations-store.ts persists it on the conversation row so we can locate the peer's kind:0 later.
  • inbox-service.svelte.ts: on every fresh DM, fires off a profile fetch for the sender — first DM lights up their avatar.
  • Messages.svelte: hydrates the cache on mount, kicks off refreshes for every visible conversation, threads cached pictures through both Avatar usages (conversation list + thread header).
  • Conversation list re-renders on cache update; staleness window 24h.

Edges noted for later: peers we've only sent to (never received from) have no peer_nostr_pubkey until they reply, so they don't get a picture lookup yet. Easy follow-up: backfill pubkey from a NIP-05 or WebFinger lookup, or proactively probe relays for kind:0 events whose content tags match a known primary.

Phase 1C — UX polish [TODO]

  • "X contacts can see your real picture" hint in Settings.
  • Re-publish kind:0 automatically when a new conversation is created (so the new contact gets key-wrapped without the user re-saving).
  • Optional: per-image AES-CTR mode for uniform-noise output (stronger, less "visually meaningful").

Acknowledged trade-offs (won't fix in v0.1)

Persistent-session is no stronger than the biometric path

The non-extractable AES key story stops exportKey, NOT decrypt+read. Anyone with origin-execution access (XSS, malicious extension) can lift the seed. Document this honestly in the README and the file header. Real fix is #7 above.

30-day TTL is client-only

expiresAt in localStorage is editable by anyone with file-system access. Server-side device binding (issue a signed nonce on unlock, expire at the server) would help but adds round-trips. v0.2 candidate.

Identity-key reuse is safe under current crypto

ed25519 seed → ed25519 (sigchain, envelope sig) + x25519 (ECDH) + HKDF → secp256k1 (nostr signer). The auditor confirmed: no cross-curve chosen-message attack path. Standard libsodium pattern.


Tracking + cross-references

  • Nostr-protocol review: see commit message of this commit; full report in the audit-trail.
  • Security audit: ditto.
  • Owner: tudisco
  • Last updated: 2026-06-08