# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. ## Project Overview **pngx-controller** is a Paperless-ngx Central Sync Controller: a single-container FastAPI service that reads from a master paperless-ngx instance and syncs documents/metadata to one or more replicas using only the public paperless REST API. Master always wins; replicas are read-only by convention. The full specification is in `pngx-controller-prd.md`. ## Tech Stack | Layer | Choice | |---|---| | Backend | Python / FastAPI | | Scheduler | APScheduler (runs inside FastAPI event loop) | | Frontend | Jinja2 templates + HTMX + Pico CSS (no JS build step) | | Database | SQLite via SQLModel | | Auth | Authentik forward auth via `X-authentik-*` headers (no app-level auth code) | | Transport | Tailscale IPs (bypasses Traefik/public internet) | ## Architecture Single process, single container. APScheduler runs the sync job inside the FastAPI event loop. An `asyncio.Lock` prevents concurrent sync runs. ``` FastAPI app ├── Web UI (Jinja2 + HTMX) — /, /replicas, /logs, /settings ├── REST API — /api/* └── APScheduler — sync job every N minutes (default: 15) SQLite (bind-mounted at /data/db.sqlite3) ├── replicas — configured instances (URL + encrypted API token) ├── sync_map — master_doc_id ↔ replica_doc_id mapping per replica ├── sync_runs — audit log of sync cycles ├── logs — per-document sync events └── settings — master_url, master_token, sync_interval_seconds, log_retention_days ``` API tokens are encrypted at rest using **Fernet** with `SECRET_KEY` from the environment. ## Sync Engine Logic 1. Acquire `asyncio.Lock` (skip if already running) 2. For each enabled replica: - `ensure_schema_parity`: sync tags, correspondents, document types, custom fields **by name** (IDs differ per instance; name→id mapping is built and used locally) - Fetch master docs modified since last sync (`?modified__gte=...`) - For each doc: download file + metadata from master, then create (`POST /api/documents/post_document/`) or update (`PUT /api/documents/{replica_id}/`) on replica - Skip file re-upload if SHA256 checksum matches `sync_map.file_checksum` 3. Advance `last_sync_ts`, close `sync_run` record, release lock Schema parity must be established **before** document sync so custom fields exist on the replica. ## Key Design Constraints - **No consume directory** — sync via REST API only; consume causes re-OCR, ID collisions, and breaks on live instances - **No changes to paperless containers** — controller is fully external - **SPOF accepted** — if controller is down, paperless instances run normally; sync resumes on recovery - Live log tail uses **SSE** at `/api/logs/stream`; dashboard sync progress uses HTMX polling `/api/sync/running` ## Environment Variables | Variable | Purpose | |---|---| | `SECRET_KEY` | Fernet key for encrypting API tokens at rest | | `DATABASE_URL` | SQLite path, e.g. `sqlite:////data/db.sqlite3` | ## Implementation Phases (from PRD) - **Phase 1 (MVP):** SQLModel schema, settings/replica CRUD, sync engine, APScheduler, basic dashboard, log table - **Phase 2:** SSE log stream, sync progress indicator, manual trigger with live feedback - **Phase 3:** Full resync per replica, deletion propagation (tombstone table), file checksum skip, alert webhooks