Kez/kez-chat/src/rate_limit.rs
Jason Tudisco c8370ffdf0 feat(kez-chat): three days of security + UX + protocol work
A multi-day batch covering: a security review punch list (Day 1-3),
the visual-encryption profile-picture feature, a fast deploy
infrastructure, ack resilience + UX polish, several nostr ecosystem
alignment changes, and a key server-independence fix.

Security review (Day 1)
  - Envelope v2 (ephemeral x25519 per message; AAD-bound AES-GCM;
    no plaintext from/to). Big metadata-leak fix flagged by reviews.
    Forward secrecy at the per-message level; v1 decrypt kept for
    a one-week migration window.
  - Routing tag rename h → q (NIP-29 collision fix).
  - Web Push payload now empty (was leaking recipient handle to FCM).
  - Log demotion + process-instance salt-hash of handles so debug
    logs don't permanently encode the social graph.

Security review (Day 2)
  - Replay protection: SEEN_CAP 500 → 10_000 ids; reject events with
    impossibly old/future created_at; openMessage enforces ±7d/+5min
    freshness on plaintext sent_at.
  - Reveal Recovery Phrase now requires a fresh passphrase prompt.
    Verifies via the same unlockIdentity that the initial unlock
    uses. Bonus: works for biometric-only sessions (recovers the
    phrase that wasn't in memory before).
  - POST /v1/messages per-IP rate limit (60/min, capacity 60, with
    a periodic idle-bucket sweep). New rate_limit.rs module + tests.

Security review (Day 3 Option A)
  - Unforgeable acks: kind-4244 events now carry a `kez-sig` tag,
    the recipient's ed25519 signature over the acked event id.
    Sender verifies against the conversation peer's KEZ primary.
    Unsigned acks still accepted during the migration window.
  - Default `since=` lookback shortened 7d → 48h (matches relay
    retention).
  - Bounded concurrency on push fanout: tokio::sync::Semaphore(32).

Security review (Day 3 Option B — nostr ecosystem alignment)
  - NIP-65 (kind:10002): publish our relay list so other clients
    can discover where to find us.
  - NIP-42 AUTH: attachSigner / detachSigner wires the user's seed
    into the relay pool so AUTH-gated relays (damus.io DMs) deliver.
  - Minimal kind:0 baseline on first session unlock (unblocks
    writes on relays that reject unknown pubkeys).
  - Acks now include ["p", senderNostrPubkey] for NIP-25 / NIP-10
    routing convention.

Web Push end-to-end
  - Server-side nostr listener (nostr_listener.rs): the chat-server
    subscribes to relays for every registered handle's addr, so
    Web Push fires even when chat goes over nostr (the live default).
  - Push fanout from messages.rs spawned with bounded concurrency.
  - Empty payload (no recipient handle leaked to FCM).
  - Self-heal endpoint GET /v1/push/subscriptions/:handle —
    auto-re-registers a subscription if server lost it.
  - Auto-enable push on first unlock (was opt-in toggle hunt).
  - In-chat nudge banner + iOS PWA install hint.

Persistent sessions
  - persistent-session.ts: AES-GCM encrypt seed under a
    non-extractable IDB key, 30-day sliding-window TTL, restore on
    every boot.
  - Auto-fetch own kind:0 from nostr on a fresh device so the user
    sees their own avatar without re-setting it.

Profile pictures + visual encryption
  - Avatar component accepts a `picture` prop (data URL); falls
    back to the deterministic identicon when absent.
  - profile-store.ts: pick → resize to 256×256 JPEG → save locally
    + publish as NIP-01 kind:0.
  - Visual encryption (visual-crypto.ts): keyed Fisher-Yates pixel
    permutation + xoshiro256** PRNG. Output is a valid PNG with
    scrambled content. Salt embedded as a #kez-visual-v1:<hex>
    URL fragment.
  - Default ON for new pictures. Strangers see colored noise on
    public nostr; contacts see the real face.
  - Per-recipient AES wraps embedded in kind:0 content
    (kez_visual_keys map). The picture's symmetric key is wrapped
    via the same SealedEnvelope crypto our DMs use.
  - Self-wrap (sender wraps to their own primary too) so a fresh
    device of the same user can descramble its own picture.
  - Stranger-view preview thumbnail in Settings (the badge tucked
    into the avatar's bottom-right corner — "this is what the
    world sees").
  - Tap-to-zoom: header avatar in a thread opens a fullscreen
    overlay.

Peer-profile resolution
  - peer-profile-store.ts: IDB-cached one-shot kind:0 fetch +
    descramble.
  - peer-profile-cell.svelte.ts: reactive mirror for UI.
  - 6h bulk-scan staleness + force-refresh on thread open.
  - Avatar usages in Messages.svelte pass peer.picture through.

Local-echo + delivery receipts
  - Outbound messages render instantly with status="sending"; flip
    to "sent" when ≥1 relay accepts; "delivered" (check-in-circle)
    when recipient's client publishes an ack.
  - SVG status icons inside the bubble; "via X" footer on outbound.
  - Persistent pending-ack queue with retry on next session start.
  - Catch-up scan (fetchAcksForEventIds) self-heals delivered
    state on conversation open.
  - markDeliveredByEventId verifies the ack signature.

Active-relay tracking + reply preference
  - SimplePool.trackRelays = true. Capture first-to-accept on send
    via Promise.any over per-relay publish promises.
  - InboxMessage.via_relay set from pool.seenOn on receive.
  - Conversation.peer_via_relay persisted on every inbound DM.
  - sendMessage takes `preferRelay` and orders publish targets
    accordingly. Acks bias the same way.
  - "via relay.X" footer renders on outbound bubbles.

Conversation list polish
  - Per-conversation `unread_count` on the Conversation type.
  - Bumped on every genuinely-new inbound; reset on thread open.
  - Accent-color pill badge in the sidebar (rounds at "99+").

Server-independence fix
  - sendMessage skips the /v1/u/:handle lookup when the caller
    passes recipientPrimary (which Messages.svelte does, from the
    cached peer_primary on the conversation row). Chat over nostr
    no longer breaks when the chat-server is down — only brand-new
    conversations still need the directory lookup.

Relay set
  - Added wss://relay.snort.social and wss://nostr.wine to the
    default pool (was 3, now 5).

Fast-deploy infrastructure (new in this batch)
  - Dockerfile gains an `export` scratch stage (extracts binary +
    web/dist only).
  - Dockerfile.runtime: tiny runtime image that COPYs prebuilt
    artifacts — no rust/npm on the remote.
  - docker-compose.fast.yml: compose override pointing chat-server
    build at Dockerfile.runtime.
  - .dockerignore: excludes target/, node_modules/, prebuilt/,
    .buildx-cache/, .git, *.db. Critical: without this, an earlier
    bug had the buildx cache nested under the build context and
    blew up to 17GB by feeding itself into itself.
  - Old: ~10 min remote build. New: 3–5 min local + 5s remote
    runtime swap. Cache lives at ~/.cache/kez-chat-buildx
    (outside any project tree).

UI polish (margins, layout, banners)
  - Authenticated routes (Welcome / Settings / Identity / Dashboard
    / Claims / AddClaim) wrapped in max-w-2xl mx-auto px-4 py-6.
  - WhatsApp-style chat bubbles: shrink-wrap to content, asymmetric
    rounded corners, inline bottom-right timestamp.
  - Push-notification nudge banner at top of /chats with iOS
    install hint.
  - Relay state popover off the "● live (N)" indicator.

WebAuthn biometric fix
  - user.id now uses the raw 32-byte ed25519 pubkey (was the
    72-byte "ed25519:<hex>" identity string, which exceeded
    WebAuthn's 64-byte limit — Android Chrome rejected it with
    "user handle exceeds 64 bytes").

Documentation
  - kez-chat/TODO.md tracks every reviewer finding with status,
    file:line references, and a phased plan. All Day 1-3 items
    marked DONE; remaining roadmap items (Double Ratchet,
    WebAuthn-gated rehydrate, addr rotation, NIP-65 peer-relay
    fetch on send) documented for future sprints.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-06-08 05:11:24 -06:00

207 lines
7.0 KiB
Rust

//! Tiny per-IP token-bucket rate limiter.
//!
//! Currently used by `POST /v1/messages` to keep a single source from
//! filling everyone's mailbox until disk fills (the v0.1 spam concern
//! called out in messages.rs). Default rate: 60 messages/min per IP.
//!
//! Why not pull in `tower_governor` or `governor`? They're great
//! crates but each adds 10+ transitive deps for what's structurally
//! ~50 lines of code. We're already shipping nostr-sdk's dep tree;
//! restraint here keeps the build snappy.
//!
//! Client IP resolution priority:
//! 1. `CF-Connecting-IP` header — Cloudflare puts the real client
//! IP here; we trust it because Cloudflare strips this header
//! from anything that wasn't routed through our tunnel.
//! 2. `X-Forwarded-For` (first hop) — fallback for non-Cloudflare
//! deployments.
//! 3. None — direct curl / loopback. Rate-limit by `0.0.0.0` so
//! noisy test traffic still gets bucketed instead of bypassing.
use std::collections::HashMap;
use std::net::IpAddr;
use std::sync::Arc;
use std::time::{Duration, Instant};
use axum::http::HeaderMap;
use tokio::sync::Mutex;
/// One bucket per client. We store the residual token count + the
/// last refill timestamp; on each `try_acquire` we compute how many
/// tokens to add based on elapsed time, then either decrement or fail.
#[derive(Debug, Clone)]
struct Bucket {
/// Tokens currently available.
tokens: f64,
/// Last time we refilled.
last_refill: Instant,
/// Most recent activity — used by the eviction sweep to drop
/// long-cold buckets so the HashMap doesn't grow forever.
last_seen: Instant,
}
#[derive(Debug, Clone, Copy)]
pub struct RateLimitConfig {
/// Bucket capacity (max burst). Once exhausted, callers fail
/// fast until enough time passes to refill ≥1 token.
pub capacity: u32,
/// Refill rate in tokens per second.
pub refill_per_sec: f64,
/// Buckets idle longer than this get evicted on next sweep so
/// short-lived clients don't pile up in memory.
pub idle_ttl: Duration,
}
impl Default for RateLimitConfig {
fn default() -> Self {
Self {
capacity: 60,
refill_per_sec: 1.0, // = 60/min steady-state
idle_ttl: Duration::from_secs(15 * 60),
}
}
}
/// Process-shared rate limiter. Cheap to clone (Arc inside).
#[derive(Clone)]
pub struct RateLimiter {
inner: Arc<Mutex<HashMap<IpAddr, Bucket>>>,
config: RateLimitConfig,
}
impl RateLimiter {
pub fn new(config: RateLimitConfig) -> Self {
Self {
inner: Arc::new(Mutex::new(HashMap::new())),
config,
}
}
/// Drain one token for `ip` if available. Returns `true` on
/// success (caller may proceed) or `false` if rate-limited
/// (caller should respond 429).
pub async fn try_acquire(&self, ip: IpAddr) -> bool {
let mut map = self.inner.lock().await;
let now = Instant::now();
let bucket = map.entry(ip).or_insert(Bucket {
tokens: self.config.capacity as f64,
last_refill: now,
last_seen: now,
});
// Refill since last touch.
let elapsed = now.saturating_duration_since(bucket.last_refill).as_secs_f64();
bucket.tokens =
(bucket.tokens + elapsed * self.config.refill_per_sec)
.min(self.config.capacity as f64);
bucket.last_refill = now;
bucket.last_seen = now;
if bucket.tokens >= 1.0 {
bucket.tokens -= 1.0;
true
} else {
false
}
}
/// Periodically called by a background sweep to drop buckets
/// for clients we haven't heard from in `idle_ttl`. Returns the
/// number of buckets removed (diagnostic).
pub async fn sweep(&self) -> usize {
let now = Instant::now();
let mut map = self.inner.lock().await;
let before = map.len();
map.retain(|_, b| now.saturating_duration_since(b.last_seen) < self.config.idle_ttl);
before - map.len()
}
}
/// Resolve the client IP from the request headers, with the
/// Cloudflare-first priority documented above. Falls back to
/// `0.0.0.0` if we can't extract anything sensible — that way
/// direct curl traffic still gets rate-limited as a single
/// "anonymous" client instead of bypassing entirely.
pub fn client_ip_from_headers(headers: &HeaderMap) -> IpAddr {
if let Some(v) = headers.get("CF-Connecting-IP").and_then(|h| h.to_str().ok()) {
if let Ok(ip) = v.trim().parse::<IpAddr>() {
return ip;
}
}
if let Some(v) = headers.get("X-Forwarded-For").and_then(|h| h.to_str().ok()) {
// X-Forwarded-For is a comma-separated list; the leftmost
// value is the original client.
if let Some(first) = v.split(',').next() {
if let Ok(ip) = first.trim().parse::<IpAddr>() {
return ip;
}
}
}
"0.0.0.0".parse().expect("0.0.0.0 is a valid IpAddr")
}
#[cfg(test)]
mod tests {
use super::*;
fn cfg_for_test() -> RateLimitConfig {
RateLimitConfig {
capacity: 3,
refill_per_sec: 10.0,
idle_ttl: Duration::from_secs(1),
}
}
#[tokio::test]
async fn within_capacity_succeeds() {
let rl = RateLimiter::new(cfg_for_test());
let ip: IpAddr = "1.2.3.4".parse().unwrap();
for _ in 0..3 {
assert!(rl.try_acquire(ip).await);
}
}
#[tokio::test]
async fn exhausting_capacity_fails_then_recovers() {
let rl = RateLimiter::new(cfg_for_test());
let ip: IpAddr = "1.2.3.4".parse().unwrap();
for _ in 0..3 {
assert!(rl.try_acquire(ip).await);
}
assert!(!rl.try_acquire(ip).await, "4th request should be rate-limited");
// Refill rate is 10 tokens/sec → 1 token in 100ms.
tokio::time::sleep(Duration::from_millis(150)).await;
assert!(rl.try_acquire(ip).await);
}
#[tokio::test]
async fn separate_ips_have_separate_buckets() {
let rl = RateLimiter::new(cfg_for_test());
let a: IpAddr = "1.2.3.4".parse().unwrap();
let b: IpAddr = "5.6.7.8".parse().unwrap();
for _ in 0..3 {
assert!(rl.try_acquire(a).await);
}
assert!(rl.try_acquire(b).await, "different IP should still have full bucket");
}
#[test]
fn cf_header_wins_over_xff() {
let mut h = HeaderMap::new();
h.insert("CF-Connecting-IP", "9.9.9.9".parse().unwrap());
h.insert("X-Forwarded-For", "8.8.8.8, 7.7.7.7".parse().unwrap());
assert_eq!(client_ip_from_headers(&h).to_string(), "9.9.9.9");
}
#[test]
fn xff_first_hop() {
let mut h = HeaderMap::new();
h.insert("X-Forwarded-For", "8.8.8.8, 7.7.7.7".parse().unwrap());
assert_eq!(client_ip_from_headers(&h).to_string(), "8.8.8.8");
}
#[test]
fn fallback_when_no_headers() {
let h = HeaderMap::new();
assert_eq!(client_ip_from_headers(&h).to_string(), "0.0.0.0");
}
}