CanMan/API.md
Jason Tudisco 360ecbdad0 Initial commit: CAN Service + examples (can-sync v1, canfs, filemanager, paste)
CAN Service: content-addressable storage with HTTP API, SQLite metadata,
file-based blob storage, thumbnail generation, and integrity verification.

can-sync v1: P2P sync sidecar using iroh-docs for encrypted peer-to-peer
replication with library/filter-based selective sync. Fully builds but
being superseded by v2 (simplified full-mirror approach).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 10:32:04 -06:00

484 lines
13 KiB
Markdown

# CAN Service API Reference
**Base URL:** `http://localhost:3210`
**API Prefix:** `/api/v1/can/0`
**Content-Type:** `application/json` (all responses are JSON)
> The `0` in the path is the `can_id`. MVP supports only container `0`.
---
## Quick Start
```bash
# Start the server
cargo run
# Store a file
curl -X POST http://localhost:3210/api/v1/can/0/ingest \
-F "file=@photo.jpg" \
-F "tags=vacation,beach" \
-F "description=Summer trip"
# Store data (no file needed)
curl -X POST http://localhost:3210/api/v1/can/0/ingest/data \
-H "Content-Type: application/json" \
-d '{"data": {"sensor": "temp", "value": 22.5}, "tags": "iot,sensor"}'
# List everything
curl http://localhost:3210/api/v1/can/0/list
# Search by tag
curl "http://localhost:3210/api/v1/can/0/search?tags=vacation"
```
---
## Response Envelope
All responses use a standard wrapper:
```json
// Success
{
"status": "success",
"data": { ... }
}
// Error
{
"status": "error",
"error": "Human-readable error message"
}
```
**HTTP Status Codes:**
| Code | Meaning |
|------|---------|
| 200 | Success |
| 400 | Bad request (missing/invalid parameters) |
| 404 | Asset not found |
| 500 | Internal error / corrupted asset |
---
## Endpoints
### 1. Ingest File (Multipart)
Upload a binary file with optional metadata.
```
POST /api/v1/can/0/ingest
Content-Type: multipart/form-data
```
**Form Fields:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `file` | Binary | **Yes** | The file to store |
| `mime_type` | String | No | Override MIME type (auto-detected from filename if omitted) |
| `human_file_name` | String | No | Logical filename (e.g. `report.pdf`) |
| `human_readable_path` | String | No | Logical folder path (e.g. `/docs/reports/`) |
| `application` | String | No | Application that produced this file |
| `user` | String | No | User or agent identity |
| `tags` | String | No | Comma-separated tags (e.g. `finance,Q4,report`) |
| `description` | String | No | Human-readable description |
**Response:**
```json
{
"status": "success",
"data": {
"timestamp": 1773014400123,
"hash": "a3b2c4d5e6f7...",
"filename": "1773014400123_a3b2c4d5e6f7_finance_Q4.pdf"
}
}
```
**Example:**
```bash
curl -X POST http://localhost:3210/api/v1/can/0/ingest \
-F "file=@quarterly_report.pdf" \
-F "application=WebUI" \
-F "user=jason" \
-F "tags=finance,Q4,report" \
-F "description=Q4 2025 financial report" \
-F "human_file_name=quarterly_report.pdf" \
-F "human_readable_path=/finance/reports/"
```
---
### 2. Ingest Data (JSON)
Store any JSON value directly -- no multipart needed. Designed for agents and programmatic use.
```
POST /api/v1/can/0/ingest/data
Content-Type: application/json
```
**JSON Body:**
| Field | Type | Required | Default | Description |
|-------|------|----------|---------|-------------|
| `data` | Any JSON | **Yes** | -- | The payload to store. Object, array, string, number, boolean, or null. |
| `mime_type` | String | No | `application/json` | Override MIME type (e.g. `text/plain` to store as `.txt`) |
| `human_file_name` | String | No | | Logical filename |
| `human_readable_path` | String | No | | Logical folder path |
| `application` | String | No | | Application/agent that produced this data |
| `user` | String | No | | User or agent identity |
| `tags` | String | No | | Comma-separated tags |
| `description` | String | No | | Human-readable description |
The `data` field is serialized to pretty-printed JSON and stored as a `.json` file (or `.txt` etc. if you override `mime_type`).
**Response:** Same as file ingest.
**Examples:**
```bash
# Minimal -- just dump an object
curl -X POST http://localhost:3210/api/v1/can/0/ingest/data \
-H "Content-Type: application/json" \
-d '{"data": {"key": "value"}}'
# Agent saving structured output
curl -X POST http://localhost:3210/api/v1/can/0/ingest/data \
-H "Content-Type: application/json" \
-d '{
"data": {
"agent_id": "planner-v2",
"session": "abc-123",
"steps": ["research", "outline", "draft"]
},
"application": "AgentOrchestrator",
"user": "planner_agent",
"tags": "agent,plan,session",
"description": "Planning output for session abc-123",
"human_file_name": "plan_output.json",
"human_readable_path": "/agents/planner/"
}'
# Store a plain string
curl -X POST http://localhost:3210/api/v1/can/0/ingest/data \
-H "Content-Type: application/json" \
-d '{"data": "Log: task completed at 14:30", "tags": "log"}'
# Store an array
curl -X POST http://localhost:3210/api/v1/can/0/ingest/data \
-H "Content-Type: application/json" \
-d '{"data": [1, 2, 3, "four"], "tags": "test"}'
# Store as plain text instead of JSON
curl -X POST http://localhost:3210/api/v1/can/0/ingest/data \
-H "Content-Type: application/json" \
-d '{"data": "plain text content", "mime_type": "text/plain"}'
```
---
### 3. Retrieve Asset
Download the physical file by its hash.
```
GET /api/v1/can/0/asset/{hash}
```
**Path Parameters:**
| Param | Type | Description |
|-------|------|-------------|
| `hash` | String | The SHA-256 hash returned from ingest |
**Response:** Raw file bytes with headers:
- `Content-Type` set to the asset's MIME type
- `Content-Disposition: attachment; filename="<human_filename>"`
Returns `500` if the asset is flagged as corrupted.
**Example:**
```bash
curl -o output.pdf http://localhost:3210/api/v1/can/0/asset/a3b2c4d5e6f7...
```
---
### 4. Get Asset Metadata
Retrieve all metadata for an asset without downloading the file.
```
GET /api/v1/can/0/asset/{hash}/meta
```
**Response:**
```json
{
"status": "success",
"data": {
"hash": "a3b2c4d5e6f7...",
"mime_type": "application/pdf",
"application": "WebUI",
"user": "jason",
"tags": ["finance", "Q4", "report"],
"description": "Q4 2025 financial report",
"human_filename": "quarterly_report.pdf",
"human_path": "/finance/reports/",
"timestamp": 1773014400123,
"is_trashed": false,
"is_corrupted": false
}
}
```
---
### 5. Update Metadata
Modify tags and/or description for an existing asset. The physical file is unchanged.
```
PATCH /api/v1/can/0/asset/{hash}
Content-Type: application/json
```
**JSON Body:**
| Field | Type | Required | Description |
|-------|------|----------|-------------|
| `tags` | String[] | No | New tag list (replaces all existing tags) |
| `description` | String | No | New description |
Both fields are optional. Only provided fields are updated.
**Example:**
```bash
curl -X PATCH http://localhost:3210/api/v1/can/0/asset/a3b2c4d5e6f7... \
-H "Content-Type: application/json" \
-d '{
"tags": ["finance", "Q4", "report", "reviewed"],
"description": "Q4 2025 report - reviewed by CFO"
}'
```
**Response:**
```json
{
"status": "success",
"data": "updated"
}
```
---
### 6. List Assets
Paginated listing of all assets with optional filtering.
```
GET /api/v1/can/0/list
```
**Query Parameters:**
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `limit` | Integer | `50` | Page size |
| `offset` | Integer | `0` | Starting position |
| `offset_time` | Integer | -- | Epoch ms cursor. Lists items strictly before/after this timestamp (based on `order`). Faster than offset for large datasets. |
| `order` | String | `desc` | Sort direction: `asc` or `desc` (by timestamp) |
| `application` | String | -- | Filter to assets from this application |
| `include_trashed` | Boolean | `false` | Include soft-deleted assets |
| `include_corrupted` | Boolean | `false` | Include corrupted assets |
**Response:**
```json
{
"status": "success",
"data": {
"items": [
{
"hash": "a3b2...",
"mime_type": "application/pdf",
"application": "WebUI",
"user": "jason",
"tags": ["finance"],
"description": "...",
"human_filename": "report.pdf",
"human_path": "/docs/",
"timestamp": 1773014400123,
"is_trashed": false,
"is_corrupted": false
}
],
"pagination": {
"limit": 50,
"offset": 0,
"total": 142
}
}
}
```
**Examples:**
```bash
# First page, 10 items
curl "http://localhost:3210/api/v1/can/0/list?limit=10"
# Second page
curl "http://localhost:3210/api/v1/can/0/list?limit=10&offset=10"
# Only assets from a specific app, oldest first
curl "http://localhost:3210/api/v1/can/0/list?application=IoTAgent&order=asc"
# Include trashed assets
curl "http://localhost:3210/api/v1/can/0/list?include_trashed=true"
```
---
### 7. Search Assets
Search with multiple filters. All filters are AND-combined.
```
GET /api/v1/can/0/search
```
**Query Parameters:**
| Param | Type | Default | Description |
|-------|------|---------|-------------|
| `hash` | String | -- | Exact hash or prefix match (e.g. `a3b2` matches `a3b2c4d5...`) |
| `start_time` | Integer | -- | Epoch ms lower bound (inclusive) |
| `end_time` | Integer | -- | Epoch ms upper bound (inclusive) |
| `tags` | String | -- | Comma-separated. AND logic: asset must have **all** specified tags. |
| `mime_type` | String | -- | Exact MIME type match (e.g. `image/jpeg`) |
| `user` | String | -- | Exact user/agent identity match |
| `application` | String | -- | Exact application match |
| `limit` | Integer | `50` | Page size |
| `offset` | Integer | `0` | Starting position |
| `order` | String | `desc` | Sort direction: `asc` or `desc` |
| `include_trashed` | Boolean | `false` | Include soft-deleted assets |
| `include_corrupted` | Boolean | `false` | Include corrupted assets |
**Response:** Same structure as List (items + pagination).
**Examples:**
```bash
# Find by hash prefix
curl "http://localhost:3210/api/v1/can/0/search?hash=a3b2"
# All JPEG images from last 24 hours
curl "http://localhost:3210/api/v1/can/0/search?mime_type=image/jpeg&start_time=1773014400000"
# Assets tagged with BOTH "sensor" AND "temperature"
curl "http://localhost:3210/api/v1/can/0/search?tags=sensor,temperature"
# Everything from a specific agent
curl "http://localhost:3210/api/v1/can/0/search?application=PlannerAgent&user=agent_v2"
# Combine filters
curl "http://localhost:3210/api/v1/can/0/search?tags=report&application=WebUI&order=asc&limit=5"
```
---
### 8. Get Thumbnail
Generate a resized thumbnail for image assets. Non-image assets return a placeholder SVG.
```
GET /api/v1/can/0/asset/{hash}/thumb/{max_width}/{max_height}
```
**Path Parameters:**
| Param | Type | Description |
|-------|------|-------------|
| `hash` | String | Asset hash |
| `max_width` | Integer | Maximum width in pixels |
| `max_height` | Integer | Maximum height in pixels |
Aspect ratio is always preserved. The image fits within the `max_width x max_height` box.
**Response:**
- **Image assets:** JPEG bytes (`Content-Type: image/jpeg`). Cached in `.thumbs/` if enabled.
- **Non-image assets:** SVG placeholder icon (`Content-Type: image/svg+xml`).
**Example:**
```bash
# 200x200 thumbnail
curl -o thumb.jpg http://localhost:3210/api/v1/can/0/asset/a3b2.../thumb/200/200
```
---
## Configuration
`config.yaml` at the project root (or pass a custom path as the first CLI argument):
```yaml
storage_root: "/var/lib/can_data" # Where files are stored
admin_token: "super_secret_rebuild" # Bearer token for admin operations
enable_thumbnail_cache: true # Cache thumbnails in .thumbs/
rebuild_error_threshold: 50 # Error tolerance before hard rebuild
verify_interval_hours: 12 # Hours between integrity scrubs
```
---
## Concepts
### Hash
Every asset gets a unique SHA-256 hash computed as `SHA256(timestamp_be_bytes + file_content)`. This hash is the primary identifier used in all API calls. Because the timestamp is mixed in, even identical file content produces different hashes at different times.
### Physical Filename
Files are stored as: `{timestamp}_{hash}_{tags}.{extension}`
For example: `1773014400123_a3b2c4d5e6f7_finance_Q4.pdf`
This naming allows offline integrity verification -- you can recompute the hash from the timestamp and file contents and compare it to the filename.
### Tags
Tags are comma-separated strings. On ingest and in search, pass them as a single string: `"finance,Q4,report"`. In metadata responses, they come back as an array: `["finance", "Q4", "report"]`.
When searching, tag matching uses AND logic: `?tags=finance,Q4` finds only assets that have **both** tags.
When patching, tags are **replaced** entirely (not merged).
### Integrity Verification
A background verifier runs automatically:
1. **On startup:** Scrubs all assets against their hashes.
2. **On file change:** Watches the storage directory for modifications.
3. **Periodically:** Every `verify_interval_hours`.
Corrupted assets are flagged (`is_corrupted: true`) and excluded from standard list/search results unless `include_corrupted=true` is passed. Retrieving a corrupted asset via GET returns a 500 error.
### OS File Attributes
Critical metadata is written to the host OS as file attributes for disaster recovery:
- **Linux/macOS:** Extended attributes (`xattr`) under the `user.can.*` namespace
- **Windows:** NTFS Alternate Data Streams (`:can.*`)
This means the SQLite database can be rebuilt from scratch by scanning the storage directory.