Jason Tudisco 6ebe02ad56 Initial commit: local-first browser sync library experiment
Four variants of the same sync library (IndexedDB, NeDB, SQLite WASM, sql.js)
plus a paste-bin demo app for testing multi-browser sync via shared folders.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 22:04:08 -06:00

116 lines
4.2 KiB
Markdown

# IndexSyncFile — NeDB Variant
Local-first key-value and document store using **NeDB** (`@seald-io/nedb`, the maintained fork) as the local cache, syncing with a user-selected folder via the File System Access API.
## Why NeDB
- **MongoDB-style queries** — `$gt`, `$lt`, `$in`, `$nin`, `$regex`, `$exists`, `$elemMatch`, and more
- **Native indexing** — `ensureIndex` handles index creation and maintenance automatically
- **Dot-notation queries** — query nested fields like `data.user.name` naturally
- **Promise-based API** — `@seald-io/nedb` provides native `*Async` methods, no callback wrappers needed
## Architecture
```
Write: app ---> NeDB in-memory (immediate) ---> /events/timestamp_hash.json
---> /data/*.db (debounced NDJSON snapshot)
Sync: /events/*.json ---> sort by timestamp ---> skip applied ---> apply to NeDB
---> persist to /data/*.db
Startup: load /data/*.db (instant) ---> sync only NEW events from /events/
```
### Two persistence layers
| Layer | Purpose | Format |
|-------|---------|--------|
| `/events/` | **Sync mechanism** — immutable event files enable multi-browser sync without data loss | One JSON file per mutation |
| `/data/` | **Fast reload** — NDJSON snapshots so startup doesn't replay the entire event history | One NDJSON file per datastore |
The event log is essential for sync. The data files are a startup optimization — without them, every page load would replay all events from scratch.
### Data files (NDJSON)
```
your-folder/data/
kv.db # all key-value records
docs.db # all collection documents
applied.db # which event files have been processed
meta.db # client ID and other metadata
```
Each file is newline-delimited JSON (one record per line). Human-readable, diffable, and written with a 250ms debounce to avoid excessive disk writes during batch operations.
### NeDB datastores (4 in-memory)
| Datastore | `_id` | Purpose |
|-----------|-------|---------|
| `kvDB` | KV key | Key-value records |
| `docsDB` | `"store\0id"` | All collection documents (with `_store`, `_docId`, `_deleted`, `data` fields) |
| `appliedDB` | event filename | Tracks processed events |
| `metaDB` | meta key | Client ID, etc. |
Document data is stored in a `data` field to avoid field name conflicts. NeDB indexes are created on `data.<fieldName>` using dot notation.
## Install and build
```bash
npm install
npm run build
```
**Dependency:** `@seald-io/nedb` ^4.1.0 — actively maintained fork of NeDB with native async/await support. Zero npm vulnerabilities.
## Usage
```ts
import { FolderSyncDB } from './dist/index.js';
const db = await FolderSyncDB.open({
dbName: 'MyApp',
autoSyncIntervalMs: 5000,
});
await db.selectFolder();
// KV
await db.kv.set('user', { name: 'Alice', role: 'admin' });
// Collections with indexes
const logs = db.collection({
name: 'logs',
indexes: [
{ name: 'byLevel', fields: ['level'] },
{ name: 'byTimestamp', fields: ['ts'] },
],
});
await logs.put({ id: 'log1', level: 'error', ts: Date.now(), msg: 'disk full' });
await logs.put({ id: 'log2', level: 'info', ts: Date.now(), msg: 'started' });
// Exact match (uses NeDB index)
const errors = await logs.findByIndex('byLevel', 'error');
// Range query (uses NeDB $gte/$lte operators)
const recent = await logs.queryByIndex('byTimestamp', {
gte: Date.now() - 3600000,
});
// Multi-field indexes work too
const items = db.collection({
name: 'items',
indexes: [{ name: 'byTypeAndStatus', fields: ['type', 'status'] }],
});
const match = await items.findByIndex('byTypeAndStatus', ['widget', 'active']);
```
## Variant-specific notes
- NeDB runs in-memory only in the browser (no Node.js `fs` access)
- Persistence comes from two sources: the event log (for sync) and the NDJSON snapshots (for fast reload)
- Directory handle is stored in a tiny IDB sidecar (NeDB can't store DOM objects)
- Range queries are only supported on single-field indexes (NeDB limitation)
- Consumer's bundler must handle NeDB's CommonJS-to-ESM conversion and the `browser` field in NeDB's `package.json`
- `getAllData()` is used for synchronous zero-copy dumps (no async overhead for persistence writes)