Files
pngx-sync/AGENTS.md
domverse b99dbf694d
All checks were successful
Deploy / deploy (push) Successful in 30s
feat: implement pngx-controller with Gitea CI/CD deployment
- Full FastAPI sync engine: master→replica document sync via paperless REST API
- Web UI: dashboard, replicas, logs, settings (Jinja2 + HTMX + Pico CSS)
- APScheduler background sync, SSE live log stream, Prometheus metrics
- Fernet encryption for API tokens at rest
- pngx.env credential file: written on save, pre-fills forms on load
- Dockerfile with layer-cached uv build, Python healthcheck
- docker-compose with host networking for Tailscale access
- Gitea Actions workflow: version bump, secret injection, docker compose deploy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-22 17:59:25 +01:00

3.4 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

pngx-controller is a Paperless-ngx Central Sync Controller: a single-container FastAPI service that reads from a master paperless-ngx instance and syncs documents/metadata to one or more replicas using only the public paperless REST API. Master always wins; replicas are read-only by convention.

The full specification is in pngx-controller-prd.md.

Tech Stack

Layer Choice
Backend Python / FastAPI
Scheduler APScheduler (runs inside FastAPI event loop)
Frontend Jinja2 templates + HTMX + Pico CSS (no JS build step)
Database SQLite via SQLModel
Auth Authentik forward auth via X-authentik-* headers (no app-level auth code)
Transport Tailscale IPs (bypasses Traefik/public internet)

Architecture

Single process, single container. APScheduler runs the sync job inside the FastAPI event loop. An asyncio.Lock prevents concurrent sync runs.

FastAPI app
├── Web UI (Jinja2 + HTMX) — /, /replicas, /logs, /settings
├── REST API — /api/*
└── APScheduler — sync job every N minutes (default: 15)

SQLite (bind-mounted at /data/db.sqlite3)
├── replicas       — configured instances (URL + encrypted API token)
├── sync_map       — master_doc_id ↔ replica_doc_id mapping per replica
├── sync_runs      — audit log of sync cycles
├── logs           — per-document sync events
└── settings       — master_url, master_token, sync_interval_seconds, log_retention_days

API tokens are encrypted at rest using Fernet with SECRET_KEY from the environment.

Sync Engine Logic

  1. Acquire asyncio.Lock (skip if already running)
  2. For each enabled replica:
    • ensure_schema_parity: sync tags, correspondents, document types, custom fields by name (IDs differ per instance; name→id mapping is built and used locally)
    • Fetch master docs modified since last sync (?modified__gte=...)
    • For each doc: download file + metadata from master, then create (POST /api/documents/post_document/) or update (PUT /api/documents/{replica_id}/) on replica
    • Skip file re-upload if SHA256 checksum matches sync_map.file_checksum
  3. Advance last_sync_ts, close sync_run record, release lock

Schema parity must be established before document sync so custom fields exist on the replica.

Key Design Constraints

  • No consume directory — sync via REST API only; consume causes re-OCR, ID collisions, and breaks on live instances
  • No changes to paperless containers — controller is fully external
  • SPOF accepted — if controller is down, paperless instances run normally; sync resumes on recovery
  • Live log tail uses SSE at /api/logs/stream; dashboard sync progress uses HTMX polling /api/sync/running

Environment Variables

Variable Purpose
SECRET_KEY Fernet key for encrypting API tokens at rest
DATABASE_URL SQLite path, e.g. sqlite:////data/db.sqlite3

Implementation Phases (from PRD)

  • Phase 1 (MVP): SQLModel schema, settings/replica CRUD, sync engine, APScheduler, basic dashboard, log table
  • Phase 2: SSE log stream, sync progress indicator, manual trigger with live feedback
  • Phase 3: Full resync per replica, deletion propagation (tombstone table), file checksum skip, alert webhooks