CAN Service: content-addressable storage with HTTP API, SQLite metadata, file-based blob storage, thumbnail generation, and integrity verification. can-sync v1: P2P sync sidecar using iroh-docs for encrypted peer-to-peer replication with library/filter-based selective sync. Fully builds but being superseded by v2 (simplified full-mirror approach). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
186 lines
8.4 KiB
Markdown
186 lines
8.4 KiB
Markdown
# CAN (Containerized Asset Network) Service Specification
|
|
**Version:** 1.0 (Final MVP)
|
|
**Target Language:** Rust
|
|
|
|
## 1. System Overview
|
|
The CAN service is a robust, self-healing local network daemon designed to simulate a high-speed, append-oriented file system. It provides an HTTP REST and Protobuf API to ingest, manage, and retrieve assets (files and data).
|
|
|
|
To bypass the slow nature of traditional OS file searches, it uses an embedded SQLite database for millisecond querying. To ensure 100% disaster-recovery readiness, critical metadata is redundantly written to the host's native OS file attributes.
|
|
|
|
**MVP Scope:** This version supports a single, default container. To future-proof the API, all routes require a `{can_id}` parameter, which **must always be `0`**. Physically, all data is mapped flatly to the configured storage root.
|
|
|
|
---
|
|
|
|
## 2. Directory Structure & Configuration
|
|
The system uses a flat directory structure within the configured root folder.
|
|
|
|
**Physical Structure:**
|
|
```text
|
|
/var/lib/can_data/ # Defined by storage_root
|
|
├── .can.db # Master SQLite Index (Hidden)
|
|
├── .trash/ # Soft-deleted physical assets (Hidden)
|
|
├── .thumbs/ # Cached thumbnail images (Hidden, if enabled)
|
|
├── 1773014400123_a3b2... # Physical Asset
|
|
└── 1773014405999_f8c9... # Physical Asset
|
|
```
|
|
|
|
**Configuration (`config.yaml`):**
|
|
```yaml
|
|
storage_root: "/var/lib/can_data" # Absolute path to the storage folder
|
|
admin_token: "super_secret_rebuild" # Bearer token for admin operations
|
|
enable_thumbnail_cache: true # Toggle caching in .thumbs/
|
|
rebuild_error_threshold: 50 # Tolerance before triggering a hard rebuild
|
|
verify_interval_hours: 12 # Frequency of full background hash verification
|
|
```
|
|
|
|
---
|
|
|
|
## 3. Storage Mechanics & Disaster Recovery
|
|
|
|
### 3.1 Cryptographic Naming Convention
|
|
Files are written with a strict physical naming format to allow offline, mathematical verification of integrity.
|
|
**Format:** `{timestamp}_{sha256}_{truncated_tags}.{extension}`
|
|
|
|
* `timestamp`: Epoch Unix timestamp in milliseconds (e.g., `1773014400123`).
|
|
* `sha256`: A SHA-256 hash calculated exactly as: `SHA256([timestamp_bytes] + [raw_file_content_bytes])`.
|
|
* `truncated_tags`: Tags joined by underscores (`_`). Non-alphanumeric characters stripped. Safely truncated to ensure the total filename stays safely under OS path limits (~255 chars). Omitted if no tags provided.
|
|
* `extension`: Derived from the `mime_type` or magic bytes (e.g., `.pdf`, `.json`).
|
|
|
|
### 3.2 Native OS File Attributes
|
|
To guarantee the SQLite database can be rebuilt from scratch, critical metadata is bound directly to the file using OS-level attributes (Extended Attributes / `xattr` on Linux/macOS; NTFS Alternate Data Streams on Windows).
|
|
|
|
**Required Attributes:**
|
|
* `can.application`: Software that ingested the file.
|
|
* `can.user`: User identity.
|
|
* `can.tags`: **The complete, unbounded, comma-separated list of tags.**
|
|
* `can.description`: Human-readable description.
|
|
* `can.human_filename`: The logical filename provided during ingestion.
|
|
* `can.human_path`: The logical folder path provided during ingestion.
|
|
|
|
---
|
|
|
|
## 4. Metadata Indexing (`.can.db`)
|
|
A fully normalized SQLite database located at `{storage_root}/.can.db`.
|
|
|
|
**Schema:**
|
|
```sql
|
|
CREATE TABLE assets (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
timestamp INTEGER NOT NULL,
|
|
hash TEXT NOT NULL UNIQUE,
|
|
mime_type TEXT NOT NULL,
|
|
application TEXT,
|
|
user_identity TEXT,
|
|
description TEXT,
|
|
actual_filename TEXT NOT NULL,
|
|
human_filename TEXT,
|
|
human_path TEXT,
|
|
is_trashed BOOLEAN NOT NULL DEFAULT 0,
|
|
is_corrupted BOOLEAN NOT NULL DEFAULT 0
|
|
);
|
|
|
|
CREATE TABLE tags (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
name TEXT NOT NULL UNIQUE
|
|
);
|
|
|
|
CREATE TABLE asset_tags (
|
|
asset_id INTEGER NOT NULL,
|
|
tag_id INTEGER NOT NULL,
|
|
PRIMARY KEY (asset_id, tag_id),
|
|
FOREIGN KEY (asset_id) REFERENCES assets(id) ON DELETE CASCADE,
|
|
FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
|
|
);
|
|
|
|
-- Optimization Indexes
|
|
CREATE INDEX idx_hash ON assets(hash);
|
|
CREATE INDEX idx_timestamp ON assets(timestamp);
|
|
CREATE INDEX idx_application ON assets(application);
|
|
CREATE INDEX idx_user ON assets(user_identity);
|
|
CREATE INDEX idx_trashed ON assets(is_trashed);
|
|
CREATE INDEX idx_tag_name ON tags(name);
|
|
```
|
|
|
|
---
|
|
|
|
## 5. Background Verifier Subsystem
|
|
A low-priority background thread dedicated to data integrity.
|
|
|
|
1. **Initial Scrub:** Runs on startup. Verifies `SHA256(timestamp + content)` for all files against their filenames.
|
|
2. **Continuous Monitoring:** Hooks into OS file system events (e.g., `inotify`). If a file is touched or altered by an external program, the verifier immediately rescans it.
|
|
3. **Periodic Scrub:** Runs every `verify_interval_hours` to catch silent bit rot.
|
|
4. **Corruption Handling:** If a hash mismatch is found, it flags `is_corrupted = 1` in `.can.db`. Corrupted files are explicitly marked in API responses and excluded from standard operations.
|
|
|
|
---
|
|
|
|
## 6. API Endpoints
|
|
|
|
**Protocol Negotation:**
|
|
All endpoints communicate in JSON by default. Clients can request/send **Protocol Buffers** by providing the HTTP headers:
|
|
* `Accept: application/x-protobuf`
|
|
* `Content-Type: application/x-protobuf`
|
|
|
|
*(Note: Endpoint paths below use `{can_id}` which must be passed as `0`)*
|
|
|
|
### 6.1 Ingest Data
|
|
* **Method:** `POST`
|
|
* **Path:** `/api/v1/can/0/ingest`
|
|
* **Content-Type:** `multipart/form-data`
|
|
* **Form Payload:**
|
|
* `file` (Binary File) - **Required**
|
|
* `mime_type` (String) - *Optional*
|
|
* `human_file_name` (String) - *Optional*
|
|
* `human_readable_path` (String) - *Optional*
|
|
* `application` (String) - *Optional*
|
|
* `user` (String) - *Optional*
|
|
* `tags` (String) - *Optional* (comma-separated)
|
|
* `description` (String) - *Optional*
|
|
* **Action:** Hashes file, writes to `{storage_root}`, attaches OS attributes, logs to DB.
|
|
* **Response (JSON):**
|
|
`{ "status": "success", "data": { "timestamp": 1773014400123, "hash": "abc...", "filename": "1773014400123_abc_tag.pdf" } }`
|
|
|
|
### 6.2 Retrieve Physical Asset
|
|
* **Method:** `GET`
|
|
* **Path:** `/api/v1/can/0/asset/{hash}`
|
|
* **Action:** Streams the physical file. Sets `Content-Type` via DB mapping and `Content-Disposition` using `human_filename`. Returns 500/Warning if `is_corrupted = 1`.
|
|
|
|
### 6.3 Retrieve Asset Metadata
|
|
* **Method:** `GET`
|
|
* **Path:** `/api/v1/can/0/asset/{hash}/meta`
|
|
* **Action:** Returns DB record.
|
|
* **Response (JSON):**
|
|
`{ "status": "success", "data": { "hash": "abc...", "mime_type": "image/jpeg", "application": "WebUI", "user": "Jason", "tags": ["tag1", "tag2"], "description": "...", "human_filename": "photo.jpg", "human_path": "/img/", "timestamp": 1773014400123, "is_trashed": false, "is_corrupted": false } }`
|
|
|
|
### 6.4 Retrieve Thumbnail
|
|
* **Method:** `GET`
|
|
* **Path:** `/api/v1/can/0/asset/{hash}/thumb/{max_width}/{max_height}`
|
|
* **Action:** Resizes image strictly preserving aspect ratio. Falls back to static icon (SVG/PNG) for non-images. If `enable_thumbnail_cache=true`, reads/writes to `{storage_root}/.thumbs/{hash}_{max_width}x{max_height}.jpg`. Streams byte payload.
|
|
|
|
### 6.5 Modify Metadata
|
|
* **Method:** `PATCH`
|
|
* **Path:** `/api/v1/can/0/asset/{hash}`
|
|
* **Body (JSON/Protobuf):**
|
|
`{ "tags": ["new_tag1", "new_tag2"], "description": "New description" }`
|
|
* **Action:** Updates `can.tags` and `can.description` OS Attributes. Updates SQLite `assets`, `tags`, and `asset_tags` tables inside a transaction. Physical filename remains unchanged.
|
|
|
|
### 6.6 List Assets
|
|
* **Method:** `GET`
|
|
* **Path:** `/api/v1/can/0/list`
|
|
* **Query Parameters:**
|
|
* `limit` (Integer) - Default `50`
|
|
* `offset` (Integer) - Default `0`
|
|
* `offset_time` (Integer) - *Optional*. Epoch ms. High-speed cursor. Lists items strictly after/before this timestamp based on `order`.
|
|
* `order` (String) - `asc` or `desc`. Default `desc`.
|
|
* `application` (String) - *Optional*. Scopes list exclusively to files ingested by this Application ID.
|
|
* `include_trashed` (Boolean) - Default `false`.
|
|
* `include_corrupted` (Boolean) - Default `false`.
|
|
* **Response:** Paginated array of metadata objects (matching 6.3 output) + `pagination` block.
|
|
|
|
### 6.7 Search Assets
|
|
* **Method:** `GET`
|
|
* **Path:** `/api/v1/can/0/search`
|
|
* **Query Parameters:**
|
|
* `hash` (String) - Exact or partial prefix.
|
|
* `start_time` (Integer) - Epoch ms.
|
|
* `end_time` (Integer) -
|