- Full FastAPI sync engine: master→replica document sync via paperless REST API - Web UI: dashboard, replicas, logs, settings (Jinja2 + HTMX + Pico CSS) - APScheduler background sync, SSE live log stream, Prometheus metrics - Fernet encryption for API tokens at rest - pngx.env credential file: written on save, pre-fills forms on load - Dockerfile with layer-cached uv build, Python healthcheck - docker-compose with host networking for Tailscale access - Gitea Actions workflow: version bump, secret injection, docker compose deploy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
3.4 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Project Overview
pngx-controller is a Paperless-ngx Central Sync Controller: a single-container FastAPI service that reads from a master paperless-ngx instance and syncs documents/metadata to one or more replicas using only the public paperless REST API. Master always wins; replicas are read-only by convention.
The full specification is in pngx-controller-prd.md.
Tech Stack
| Layer | Choice |
|---|---|
| Backend | Python / FastAPI |
| Scheduler | APScheduler (runs inside FastAPI event loop) |
| Frontend | Jinja2 templates + HTMX + Pico CSS (no JS build step) |
| Database | SQLite via SQLModel |
| Auth | Authentik forward auth via X-authentik-* headers (no app-level auth code) |
| Transport | Tailscale IPs (bypasses Traefik/public internet) |
Architecture
Single process, single container. APScheduler runs the sync job inside the FastAPI event loop. An asyncio.Lock prevents concurrent sync runs.
FastAPI app
├── Web UI (Jinja2 + HTMX) — /, /replicas, /logs, /settings
├── REST API — /api/*
└── APScheduler — sync job every N minutes (default: 15)
SQLite (bind-mounted at /data/db.sqlite3)
├── replicas — configured instances (URL + encrypted API token)
├── sync_map — master_doc_id ↔ replica_doc_id mapping per replica
├── sync_runs — audit log of sync cycles
├── logs — per-document sync events
└── settings — master_url, master_token, sync_interval_seconds, log_retention_days
API tokens are encrypted at rest using Fernet with SECRET_KEY from the environment.
Sync Engine Logic
- Acquire
asyncio.Lock(skip if already running) - For each enabled replica:
ensure_schema_parity: sync tags, correspondents, document types, custom fields by name (IDs differ per instance; name→id mapping is built and used locally)- Fetch master docs modified since last sync (
?modified__gte=...) - For each doc: download file + metadata from master, then create (
POST /api/documents/post_document/) or update (PUT /api/documents/{replica_id}/) on replica - Skip file re-upload if SHA256 checksum matches
sync_map.file_checksum
- Advance
last_sync_ts, closesync_runrecord, release lock
Schema parity must be established before document sync so custom fields exist on the replica.
Key Design Constraints
- No consume directory — sync via REST API only; consume causes re-OCR, ID collisions, and breaks on live instances
- No changes to paperless containers — controller is fully external
- SPOF accepted — if controller is down, paperless instances run normally; sync resumes on recovery
- Live log tail uses SSE at
/api/logs/stream; dashboard sync progress uses HTMX polling/api/sync/running
Environment Variables
| Variable | Purpose |
|---|---|
SECRET_KEY |
Fernet key for encrypting API tokens at rest |
DATABASE_URL |
SQLite path, e.g. sqlite:////data/db.sqlite3 |
Implementation Phases (from PRD)
- Phase 1 (MVP): SQLModel schema, settings/replica CRUD, sync engine, APScheduler, basic dashboard, log table
- Phase 2: SSE log stream, sync progress indicator, manual trigger with live feedback
- Phase 3: Full resync per replica, deletion propagation (tombstone table), file checksum skip, alert webhooks