CanMan/spec.md
Jason Tudisco 360ecbdad0 Initial commit: CAN Service + examples (can-sync v1, canfs, filemanager, paste)
CAN Service: content-addressable storage with HTTP API, SQLite metadata,
file-based blob storage, thumbnail generation, and integrity verification.

can-sync v1: P2P sync sidecar using iroh-docs for encrypted peer-to-peer
replication with library/filter-based selective sync. Fully builds but
being superseded by v2 (simplified full-mirror approach).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 10:32:04 -06:00

8.4 KiB

CAN (Containerized Asset Network) Service Specification

Version: 1.0 (Final MVP) Target Language: Rust

1. System Overview

The CAN service is a robust, self-healing local network daemon designed to simulate a high-speed, append-oriented file system. It provides an HTTP REST and Protobuf API to ingest, manage, and retrieve assets (files and data).

To bypass the slow nature of traditional OS file searches, it uses an embedded SQLite database for millisecond querying. To ensure 100% disaster-recovery readiness, critical metadata is redundantly written to the host's native OS file attributes.

MVP Scope: This version supports a single, default container. To future-proof the API, all routes require a {can_id} parameter, which must always be 0. Physically, all data is mapped flatly to the configured storage root.


2. Directory Structure & Configuration

The system uses a flat directory structure within the configured root folder.

Physical Structure:

/var/lib/can_data/           # Defined by storage_root
├── .can.db                  # Master SQLite Index (Hidden)
├── .trash/                  # Soft-deleted physical assets (Hidden)
├── .thumbs/                 # Cached thumbnail images (Hidden, if enabled)
├── 1773014400123_a3b2...    # Physical Asset
└── 1773014405999_f8c9...    # Physical Asset

Configuration (config.yaml):

storage_root: "/var/lib/can_data"       # Absolute path to the storage folder
admin_token: "super_secret_rebuild"     # Bearer token for admin operations
enable_thumbnail_cache: true            # Toggle caching in .thumbs/
rebuild_error_threshold: 50             # Tolerance before triggering a hard rebuild
verify_interval_hours: 12               # Frequency of full background hash verification

3. Storage Mechanics & Disaster Recovery

3.1 Cryptographic Naming Convention

Files are written with a strict physical naming format to allow offline, mathematical verification of integrity. Format: {timestamp}_{sha256}_{truncated_tags}.{extension}

  • timestamp: Epoch Unix timestamp in milliseconds (e.g., 1773014400123).
  • sha256: A SHA-256 hash calculated exactly as: SHA256([timestamp_bytes] + [raw_file_content_bytes]).
  • truncated_tags: Tags joined by underscores (_). Non-alphanumeric characters stripped. Safely truncated to ensure the total filename stays safely under OS path limits (~255 chars). Omitted if no tags provided.
  • extension: Derived from the mime_type or magic bytes (e.g., .pdf, .json).

3.2 Native OS File Attributes

To guarantee the SQLite database can be rebuilt from scratch, critical metadata is bound directly to the file using OS-level attributes (Extended Attributes / xattr on Linux/macOS; NTFS Alternate Data Streams on Windows).

Required Attributes:

  • can.application: Software that ingested the file.
  • can.user: User identity.
  • can.tags: The complete, unbounded, comma-separated list of tags.
  • can.description: Human-readable description.
  • can.human_filename: The logical filename provided during ingestion.
  • can.human_path: The logical folder path provided during ingestion.

4. Metadata Indexing (.can.db)

A fully normalized SQLite database located at {storage_root}/.can.db.

Schema:

CREATE TABLE assets (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  timestamp INTEGER NOT NULL,
  hash TEXT NOT NULL UNIQUE,
  mime_type TEXT NOT NULL,
  application TEXT,
  user_identity TEXT,
  description TEXT,
  actual_filename TEXT NOT NULL,
  human_filename TEXT,
  human_path TEXT,
  is_trashed BOOLEAN NOT NULL DEFAULT 0,
  is_corrupted BOOLEAN NOT NULL DEFAULT 0
);

CREATE TABLE tags (
  id INTEGER PRIMARY KEY AUTOINCREMENT,
  name TEXT NOT NULL UNIQUE
);

CREATE TABLE asset_tags (
  asset_id INTEGER NOT NULL,
  tag_id INTEGER NOT NULL,
  PRIMARY KEY (asset_id, tag_id),
  FOREIGN KEY (asset_id) REFERENCES assets(id) ON DELETE CASCADE,
  FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);

-- Optimization Indexes
CREATE INDEX idx_hash ON assets(hash);
CREATE INDEX idx_timestamp ON assets(timestamp);
CREATE INDEX idx_application ON assets(application);
CREATE INDEX idx_user ON assets(user_identity);
CREATE INDEX idx_trashed ON assets(is_trashed);
CREATE INDEX idx_tag_name ON tags(name);

5. Background Verifier Subsystem

A low-priority background thread dedicated to data integrity.

  1. Initial Scrub: Runs on startup. Verifies SHA256(timestamp + content) for all files against their filenames.
  2. Continuous Monitoring: Hooks into OS file system events (e.g., inotify). If a file is touched or altered by an external program, the verifier immediately rescans it.
  3. Periodic Scrub: Runs every verify_interval_hours to catch silent bit rot.
  4. Corruption Handling: If a hash mismatch is found, it flags is_corrupted = 1 in .can.db. Corrupted files are explicitly marked in API responses and excluded from standard operations.

6. API Endpoints

Protocol Negotation: All endpoints communicate in JSON by default. Clients can request/send Protocol Buffers by providing the HTTP headers:

  • Accept: application/x-protobuf
  • Content-Type: application/x-protobuf

(Note: Endpoint paths below use {can_id} which must be passed as 0)

6.1 Ingest Data

  • Method: POST
  • Path: /api/v1/can/0/ingest
  • Content-Type: multipart/form-data
  • Form Payload:
    • file (Binary File) - Required
    • mime_type (String) - Optional
    • human_file_name (String) - Optional
    • human_readable_path (String) - Optional
    • application (String) - Optional
    • user (String) - Optional
    • tags (String) - Optional (comma-separated)
    • description (String) - Optional
  • Action: Hashes file, writes to {storage_root}, attaches OS attributes, logs to DB.
  • Response (JSON): { "status": "success", "data": { "timestamp": 1773014400123, "hash": "abc...", "filename": "1773014400123_abc_tag.pdf" } }

6.2 Retrieve Physical Asset

  • Method: GET
  • Path: /api/v1/can/0/asset/{hash}
  • Action: Streams the physical file. Sets Content-Type via DB mapping and Content-Disposition using human_filename. Returns 500/Warning if is_corrupted = 1.

6.3 Retrieve Asset Metadata

  • Method: GET
  • Path: /api/v1/can/0/asset/{hash}/meta
  • Action: Returns DB record.
  • Response (JSON): { "status": "success", "data": { "hash": "abc...", "mime_type": "image/jpeg", "application": "WebUI", "user": "Jason", "tags": ["tag1", "tag2"], "description": "...", "human_filename": "photo.jpg", "human_path": "/img/", "timestamp": 1773014400123, "is_trashed": false, "is_corrupted": false } }

6.4 Retrieve Thumbnail

  • Method: GET
  • Path: /api/v1/can/0/asset/{hash}/thumb/{max_width}/{max_height}
  • Action: Resizes image strictly preserving aspect ratio. Falls back to static icon (SVG/PNG) for non-images. If enable_thumbnail_cache=true, reads/writes to {storage_root}/.thumbs/{hash}_{max_width}x{max_height}.jpg. Streams byte payload.

6.5 Modify Metadata

  • Method: PATCH
  • Path: /api/v1/can/0/asset/{hash}
  • Body (JSON/Protobuf): { "tags": ["new_tag1", "new_tag2"], "description": "New description" }
  • Action: Updates can.tags and can.description OS Attributes. Updates SQLite assets, tags, and asset_tags tables inside a transaction. Physical filename remains unchanged.

6.6 List Assets

  • Method: GET
  • Path: /api/v1/can/0/list
  • Query Parameters:
    • limit (Integer) - Default 50
    • offset (Integer) - Default 0
    • offset_time (Integer) - Optional. Epoch ms. High-speed cursor. Lists items strictly after/before this timestamp based on order.
    • order (String) - asc or desc. Default desc.
    • application (String) - Optional. Scopes list exclusively to files ingested by this Application ID.
    • include_trashed (Boolean) - Default false.
    • include_corrupted (Boolean) - Default false.
  • Response: Paginated array of metadata objects (matching 6.3 output) + pagination block.

6.7 Search Assets

  • Method: GET
  • Path: /api/v1/can/0/search
  • Query Parameters:
    • hash (String) - Exact or partial prefix.
    • start_time (Integer) - Epoch ms.
    • end_time (Integer) -