pngx-sync/pngx-controller-prd.md

# PRD — pngx-controller

> Paperless-ngx Central Sync Controller with Web UI

**Status:** Draft
**Author:** Alex
**Last updated:** 2026-03-20

---

## Table of Contents

1. [Problem Statement](#1-problem-statement)
2. [Goals](#2-goals)
3. [Non-Goals](#3-non-goals)
4. [Design Principles](#4-design-principles)
5. [Architecture](#5-architecture)
6. [Web UI](#6-web-ui)
7. [Data Model](#7-data-model)
8. [API](#8-api)
9. [Sync Engine](#9-sync-engine)
10. [Deployment](#10-deployment)
11. [Implementation Phases](#11-implementation-phases)
12. [Open Questions](#12-open-questions)

---

## 1. Problem Statement

Paperless-ngx has no native multi-instance sync. The built-in export/import workflow works well for snapshots, but the import via the consume directory requires a blank instance without existing users — making it unusable for continuous sync against a live replica. The goal is a single central controller that reads from a designated master and writes to one or more replicas, using nothing but the public paperless REST API.

---

## 2. Goals

- **High availability / failover** — replicas stay current enough to serve as a fallback if the master goes down; they are fully user-facing and exposed via Traefik + Authentik
- **Backup** — at least one replica acts as a verified, importable backup at all times
- **No paperless fork** — all sync logic lives outside the paperless-ngx containers, using only its public API
- **Conflict resolution** — master always wins; replicas never push changes back
- **Web UI** — all configuration, monitoring, and operations happen through a browser interface
- **Minimal ops overhead** — one additional container in the existing Docker/Traefik/Authentik stack
- **Observable by default** — health, metrics, and structured logs available without requiring the web UI

---

## 3. Non-Goals

- Bidirectional sync or multi-master
- Real-time sync (eventual consistency is acceptable, target: ≤15 min lag)
- Replacing Authentik SSO or Traefik routing on paperless instances
- Syncing user accounts, passwords, or sessions across instances
- Automatic failover (manual failover procedure only in v1)
- **Deletion propagation in v1** — replicas are strictly additive; documents deleted on the master are not removed from replicas. This is safe and well-defined behaviour for a first version.

---

## 4. Design Principles

- **No sidecars, no changes to paperless containers** — the controller is a separate service that talks to all instances from outside via their existing REST APIs
- **Central single point of control** — one place to configure, one place to check logs, one place to restart when something breaks
- **SPOF accepted** — if the controller is down, all paperless instances keep running normally; they just don't sync until it recovers. This is acceptable because paperless does not depend on the controller in any way
- **Tailscale-native transport** — all API calls go over Tailscale IPs directly, bypassing public internet entirely
- **Fail fast and visibly** — misconfiguration (missing env vars, bad credentials, unreachable replicas) surfaces immediately as hard errors, not silent no-ops

---

## 5. Architecture

```
┌─ domverse ──────────────────────────────────────────┐
│                                                      │
│  ┌─ pngx-controller ──────────────────────────────┐ │
│  │                                                 │ │
│  │   FastAPI app          APScheduler              │ │
│  │   ├── Web UI (HTMX)   └── sync job (every Nm,  │ │
│  │   ├── REST API (/api)      max 30 min timeout)  │ │
│  │   ├── /healthz  (no auth)                       │ │
│  │   └── /metrics  (no auth, Prometheus)           │ │
│  │                                                 │ │
│  │   SQLite (WAL mode) — replicas, sync_map, logs  │ │
│  └─────────────────────────────────────────────────┘ │
│       │  Tailscale (direct, host network)             │
└───────┼───────────────────────────────────────────── ┘
        │
        ├── GET/POST  →  paperless #1 (master,  100.x.x.x:8000)
        ├── GET/POST  →  paperless #2 (replica, 100.y.y.y:8000)
        └── GET/POST  →  paperless #3 (replica, 100.z.z.z:8000)
```

Single process, single container. APScheduler runs a single global sync job at the configured base interval. Each replica has an optional `sync_interval_seconds` override; the job checks per replica whether enough time has elapsed since `last_sync_ts` before including it in the cycle. An `asyncio.Lock` prevents concurrent runs.

**Startup sequence:**
1. Validate `SECRET_KEY` is present and a valid Fernet key — exit with error if not
2. Verify DB file path is writable — exit with error if not
3. Run SQLite migrations; set `PRAGMA journal_mode=WAL`
4. Seed `settings` table from env vars if not already populated (`MASTER_URL`, `MASTER_TOKEN`)
5. Close any orphaned `sync_runs` (records where `finished_at IS NULL`) left by a previous unclean shutdown — set `finished_at = now()`, `timed_out = true`, log a warning per record
6. Start APScheduler; register sync job
7. Start FastAPI

All structured logs are emitted as JSON to stdout (`{"ts": ..., "level": ..., "replica": ..., "doc_id": ..., "msg": ...}`) in addition to being written to the `logs` DB table. DB logs drive the web UI; stdout logs are for external aggregators (Loki, etc.).

### Tech Stack

| Layer | Choice | Notes |
|---|---|---|
| Backend | Python / FastAPI | Sync engine as APScheduler background job |
| Frontend | Jinja2 + HTMX | No JS build step; partial page updates via HTMX |
| Styling | Pico CSS | Minimal, semantic, no build required |
| Database | SQLite via SQLModel | WAL mode; single file, bind-mounted volume |
| Auth | Authentik forward auth | `X-authentik-*` headers; no app-level auth code needed |

---

## 6. Web UI

### Pages

#### Dashboard (`/`)

- Master instance: URL, connection status (green/red indicator), last successful contact
- Per-replica row: name, URL, lag (time since last successful sync), status badge (`synced` / `syncing` / `error` / `unreachable` / `suspended`) — runtime-computed; and error rate from last run (e.g. `47 synced · 3 failed` linking to filtered log view)
- Global **Sync now** button — triggers an immediate full cycle across all enabled replicas; returns 202 immediately; client polls `/api/sync/running` which returns progress detail, not just a boolean
- Last sync: timestamp, duration, documents synced, documents failed

#### Replicas (`/replicas`)

- Table of configured replicas: name, Tailscale URL, enabled toggle, per-replica sync interval (blank = global), last sync timestamp, action buttons (edit, delete, **Test connection**)
- Suspended replicas show a `suspended — N consecutive failures` badge with a distinct **Re-enable** button (separate from the enabled toggle)
- **Add replica** form: name, Tailscale URL, API token (masked input), sync interval override (optional), enabled checkbox — form runs a connection test before saving and shows the result inline; if the replica already contains documents, a **Reconcile** button appears after successful save to populate `sync_map` without re-uploading files
- Replica detail view (`/replicas/{id}`):
  - Cumulative sync stats
  - Sync run history: table of the last 20 sync runs — timestamp, duration, docs synced, docs failed, triggered by
  - Per-document sync map (paginated)
  - Error history

#### Logs (`/logs`)

- Live log stream via SSE — HTMX connects to `/api/logs/stream` using the `hx-ext="sse"` extension
- Filter by: replica, level (`info` / `warning` / `error`), date range
- Full-text search on message (SQLite FTS5)
- Each log entry: timestamp, level, replica name, document ID (if applicable), message
- **Clear logs** action: remove entries older than N days

#### Settings (`/settings`)

- Master instance URL + API token (editable in place, with connection test on save)
- Global sync interval (minutes)
- Log retention (days)
- Sync cycle timeout (minutes; default 30)
- Task poll timeout (minutes; default 10)
- Replica suspend threshold (consecutive failed cycles; default 5)
- Max concurrent requests per target (default 4)
- **Notifications:** alert target (Gotify URL + token, or generic webhook URL + optional auth header), error threshold (docs failed per run to trigger alert), alert cooldown (minutes; default 60)
- **Danger zone:** Full resync button per replica — available in Phase 3; wipes the sync map for that replica and re-syncs everything from scratch

---

## 7. Data Model

```sql
-- Enable WAL mode on every connection open:
-- PRAGMA journal_mode=WAL;

CREATE TABLE replicas (
  id                   INTEGER PRIMARY KEY,
  name                 TEXT NOT NULL,
  url                  TEXT NOT NULL,           -- Tailscale URL, e.g. http://100.y.y.y:8000
  api_token            TEXT NOT NULL,           -- Fernet-encrypted (key from SECRET_KEY env)
  enabled              BOOLEAN DEFAULT TRUE,
  sync_interval_seconds INTEGER,               -- NULL = use global setting
  last_sync_ts         DATETIME,               -- per-replica; advanced only on successful sync
  consecutive_failures INTEGER DEFAULT 0,      -- reset to 0 on any successful sync cycle
  suspended_at         DATETIME,               -- NULL = active; set when consecutive_failures >= threshold
  last_alert_at        DATETIME,               -- used to enforce alert cooldown per replica
  created_at           DATETIME DEFAULT CURRENT_TIMESTAMP
);

CREATE TABLE sync_map (
  id              INTEGER PRIMARY KEY,
  replica_id      INTEGER REFERENCES replicas(id) ON DELETE CASCADE,
  master_doc_id   INTEGER NOT NULL,
  replica_doc_id  INTEGER,              -- NULL while post_document task is pending
  task_id         TEXT,                 -- Celery task UUID returned by post_document; cleared once resolved
  last_synced     DATETIME,
  file_checksum   TEXT,                 -- SHA256 of original file; populated but not used for skipping until Phase 3
  status          TEXT DEFAULT 'pending',  -- pending | ok | error
  error_msg       TEXT,
  retry_count     INTEGER DEFAULT 0,    -- incremented each time this doc is retried from error state
  UNIQUE(replica_id, master_doc_id)
);
-- Recommended indexes:
--   CREATE INDEX idx_sync_map_replica ON sync_map(replica_id);
--   CREATE INDEX idx_sync_map_status  ON sync_map(replica_id, status);

CREATE TABLE sync_runs (
  id           INTEGER PRIMARY KEY,
  replica_id   INTEGER REFERENCES replicas(id) ON DELETE SET NULL,  -- NULL = all-replica run
  started_at   DATETIME,
  finished_at  DATETIME,
  triggered_by TEXT,               -- 'scheduler' | 'manual' | 'reconcile'
  docs_synced  INTEGER DEFAULT 0,
  docs_failed  INTEGER DEFAULT 0,
  timed_out    BOOLEAN DEFAULT FALSE
);

CREATE TABLE logs (
  id          INTEGER PRIMARY KEY,
  run_id      INTEGER REFERENCES sync_runs(id) ON DELETE SET NULL,
  replica_id  INTEGER REFERENCES replicas(id) ON DELETE SET NULL,
  level       TEXT,                -- info | warning | error
  message     TEXT,
  doc_id      INTEGER,
  created_at  DATETIME DEFAULT CURRENT_TIMESTAMP
);
-- FTS5 index for full-text search on message:
-- CREATE VIRTUAL TABLE logs_fts USING fts5(message, content=logs, content_rowid=id);

CREATE TABLE settings (
  key   TEXT PRIMARY KEY,
  value TEXT                       -- master_token value is Fernet-encrypted, same as replicas.api_token
);
-- Keys:
--   master_url                    (seeded from MASTER_URL env var on first boot)
--   master_token                  (encrypted; seeded from MASTER_TOKEN env var on first boot)
--   sync_interval_seconds         (default 900)
--   log_retention_days            (default 90)
--   sync_cycle_timeout_seconds    (default 1800)
--   task_poll_timeout_seconds     (default 600)
--   replica_suspend_threshold     (default 5)
--   max_concurrent_requests       (default 4; applies independently per target instance)
--   alert_target_type             ('gotify' | 'webhook' | '')
--   alert_target_url              (Gotify or webhook URL)
--   alert_target_token            (encrypted; Gotify token or webhook auth header value)
--   alert_error_threshold         (docs failed per run to trigger; default 5)
--   alert_cooldown_seconds        (minimum seconds between alerts per replica; default 3600)
```

---

## 8. API

All endpoints return JSON unless noted. HTMX partial updates use the `/`-prefixed HTML routes which return rendered template fragments.

**Authentication:** All `/api/*` and UI routes go through Authentik forward auth. `/healthz` and `/metrics` are excluded — configured via a separate Traefik router without the Authentik middleware.

| Method | Path | Auth | Description |
|---|---|---|---|
| `GET` | `/healthz` | None | `{"status":"ok","db":"ok"}` or 503. For Docker `HEALTHCHECK` and uptime monitors. |
| `GET` | `/metrics` | None | Prometheus text format — see metrics list below |
| `GET` | `/api/status` | Authentik | Dashboard data: master health, per-replica lag, last-run error rates (runtime-computed) |
| `POST` | `/api/sync` | Authentik | Trigger immediate sync — returns **202** immediately. Accepts `?replica_id=N`. |
| `GET` | `/api/sync/running` | Authentik | `{"running": bool, "phase": str, "docs_done": int, "docs_total": int}` — drives UI spinner |
| `GET` | `/api/replicas` | Authentik | List all replicas |
| `POST` | `/api/replicas` | Authentik | Add a replica — runs connection test before saving; returns 422 if test fails |
| `PUT` | `/api/replicas/{id}` | Authentik | Update a replica — re-runs connection test if URL or token changed |
| `DELETE` | `/api/replicas/{id}` | Authentik | Remove a replica and its sync_map entries |
| `POST` | `/api/replicas/{id}/test` | Authentik | Test connection; returns `{"ok": bool, "error": str\|null, "latency_ms": int, "doc_count": int}` |
| `POST` | `/api/replicas/{id}/reconcile` | Authentik | Match existing replica documents to master by ASN / (title + date); populate sync_map without re-uploading |
| `POST` | `/api/replicas/{id}/resync` | Authentik | Wipe sync_map for this replica, trigger full resync *(Phase 3)* |
| `POST` | `/api/replicas/{id}/unsuspend` | Authentik | Clear `suspended_at` and `consecutive_failures`, re-enable replica |
| `GET` | `/api/logs` | Authentik | Paginated log query (`?replica_id`, `?level`, `?from`, `?to`, `?q` for FTS) |
| `GET` | `/api/logs/stream` | Authentik | SSE endpoint for live log tail |
| `GET` | `/api/settings` | Authentik | Read all settings |
| `PUT` | `/api/settings` | Authentik | Update settings; validate master connection before saving master_url/master_token |

### Prometheus metrics (`/metrics`)

| Metric | Type | Labels |
|---|---|---|
| `pngx_sync_docs_total` | Counter | `replica`, `status` (`ok`/`error`) |
| `pngx_sync_duration_seconds` | Histogram | `triggered_by` |
| `pngx_replica_lag_seconds` | Gauge | `replica` |
| `pngx_replica_pending_tasks` | Gauge | `replica` |
| `pngx_replica_consecutive_failures` | Gauge | `replica` |
| `pngx_sync_running` | Gauge | — |

---

## 9. Sync Engine

### Why not the consume directory

The consume directory triggers paperless's full ingestion pipeline: re-OCR, re-classification, ID reassignment. It also requires no prior documents to exist. Syncing via consume to a live instance with users causes ID collisions and duplicate processing. The controller uses the REST API's `POST /api/documents/post_document/` (create) and `PATCH /api/documents/{id}/` (update metadata) endpoints with explicit metadata.

**Important:** `post_document` still goes through paperless's Celery consumption pipeline — OCR will run on replicas for newly uploaded documents. This adds processing overhead but the metadata supplied at upload time (title, tags, dates, etc.) takes precedence. This is an accepted cost of using the public API without modifying paperless containers.

### What gets synced

Replicas are HA and fully user-facing; both original and archived files are synced.

| Entity | Method | Notes |
|---|---|---|
| Documents (original file) | Binary download/upload | Always synced |
| Documents (archived/OCR'd file) | Binary download/upload | Always synced — replicas are HA |
| Document metadata | JSON via API | Title, dates, notes, custom fields, ASN |
| Tags | API + name-based dedup | IDs differ per instance; mapped by name |
| Correspondents | API + name-based dedup | Same |
| Document types | API + name-based dedup | Same |
| Custom field schemas | API, synced before docs | Schema must exist on replica before document data |
| Users / groups | Not synced | Managed independently per instance |

Replicas are **strictly additive in v1**: documents deleted on the master are not removed from replicas.

### Resilience primitives

**Concurrency throttle:** An `asyncio.Semaphore` with `max_concurrent_requests` (default 4) is created per target instance (one for master, one per replica) at the start of each sync cycle. All HTTP calls acquire the relevant semaphore before executing. This prevents the controller from overwhelming any single paperless instance with concurrent requests, especially during a full initial sync.

**Retry with exponential backoff:** All individual HTTP calls to master and replicas are wrapped in a retry decorator — 3 attempts with 2 s / 4 s / 8 s delays. Only network-level and 5xx errors are retried; 4xx errors (auth, not found) fail immediately. Each retry is logged at `warning` level. A document is only marked `error` in `sync_map` after all retries are exhausted.

**Task poll timeout:** After `POST /api/documents/post_document/` returns a task UUID, the controller polls `/api/tasks/?task_id=<uuid>` on the next sync cycle (step 4b below). If a task has been pending for longer than `task_poll_timeout_seconds` (default 600 s / 10 min), it is marked `error` with message `"task timed out"` and `replica_doc_id` remains NULL. The document will be retried from scratch on a full resync.

**Sync cycle timeout:** The entire sync cycle (all replicas combined) has a hard timeout of `sync_cycle_timeout_seconds` (default 1800 s / 30 min). If exceeded, the cycle is cancelled, the `asyncio.Lock` released, `sync_run.timed_out` set to `true`, and a `warning` log emitted. The next scheduled run starts fresh.

**Auto-suspend:** After `replica_suspend_threshold` (default 5) consecutive sync cycles where a replica fails entirely (the replica itself is unreachable or auth fails — not individual document errors), the controller sets `suspended_at = now()` and stops including that replica in future sync cycles. A prominent `error` log is emitted. The UI shows a `suspended` badge and a **Re-enable** button (`POST /api/replicas/{id}/unsuspend`). `consecutive_failures` resets to 0 on any successful sync cycle for that replica.

**SQLite backup:** On every successful sync run completion, `sqlite3.connect(db_path).backup(sqlite3.connect(backup_path))` is called to produce `/data/db.sqlite3.bak`. This is safe while the DB is open and provides one-cycle-lag recovery from DB corruption.

**Alert / notification:** After each sync run, if `docs_failed >= alert_error_threshold` OR a replica was just suspended, and `now() - replica.last_alert_at > alert_cooldown_seconds`, the controller sends an alert and updates `replica.last_alert_at`. Two target types are supported:
- **Gotify:** `POST {alert_target_url}/message` with `{"title": "pngx-controller alert", "message": "...", "priority": 7}`
- **Generic webhook:** `POST {alert_target_url}` with JSON payload and optional `Authorization` header

Alert payload:
```json
{
  "event": "sync_failures_threshold" | "replica_suspended",
  "replica": "backup",
  "replica_url": "http://100.y.y.y:8000",
  "consecutive_failures": 5,
  "docs_failed": 12,
  "docs_synced": 3,
  "timestamp": "2026-03-20T14:00:00Z",
  "controller_url": "https://pngx.domverse.de"
}
```

### Reconcile mode

Used when adding a replica that already contains documents to avoid creating duplicates. Triggered via `POST /api/replicas/{id}/reconcile`. The reconcile process:

1. Paginate through all documents on the replica; build a map of `asn → replica_doc` and `(title, created_date) → replica_doc`
2. Paginate through all documents on the master; for each master doc:
   - Match by ASN first (most reliable); fall back to (title + created_date)
   - If matched: insert `sync_map` row with both IDs, `status='ok'`, compute `file_checksum` from master download
   - If unmatched: leave for the normal sync cycle to handle (will be created on replica)
3. Replica documents with no master match are left untouched
4. Reconcile is **non-destructive and idempotent** — safe to run multiple times

Reconcile is a one-time operation per replica. After it completes, normal sync cycles take over.

### Sync cycle

The name→id mapping for tags, correspondents, document types, and custom fields is built **in memory** at the start of each sync run by querying both master and replica. It is not persisted to the DB; it is rebuilt every cycle to avoid stale mappings.

The APScheduler job fires at `sync_interval_seconds` (global setting). At the start of each run, each replica is checked: if `replica.sync_interval_seconds IS NOT NULL` and `now() - replica.last_sync_ts < replica.sync_interval_seconds`, that replica is skipped this cycle. This allows per-replica intervals without multiple scheduler jobs.

```
Every N minutes (global base interval), with sync_cycle_timeout_seconds hard limit:

1. acquire asyncio.Lock — skip cycle if already running

2. create sync_run record (triggered_by = 'scheduler' | 'manual')

3. determine eligible replicas:
   enabled AND NOT suspended AND (sync_interval_seconds IS NULL
     OR now() - last_sync_ts >= sync_interval_seconds)

4. fetch changed_docs from master with pagination (outside the replica loop):
   page = 1
   all_changed_docs = []
   loop:
     response = GET master /api/documents/
                  ?modified__gte={min(last_sync_ts across eligible replicas)}
                  &ordering=modified&page_size=100&page={page}
                (with retry/backoff, inside master semaphore)
     all_changed_docs += response.results
     if response.next is None: break
     page += 1

5. for each eligible replica:

   a. ensure_schema_parity(master, replica)
      → paginate and query all tags / correspondents / doc types / custom fields
        from master and replica (inside respective semaphores, with retry/backoff)
      → create missing entities on replica
      → build in-memory name→id maps:
          master_tag_id  → replica_tag_id
          master_cf_id   → replica_cf_id
          (same for correspondents, document types)

   b. resolve pending sync_map entries (status='pending', replica_doc_id IS NULL):
      → for each: GET replica /api/tasks/?task_id={task_id}
                  (inside replica semaphore, with retry/backoff)
      → if complete:  update replica_doc_id, clear task_id, set status='ok'
      → if failed:    set status='error', increment retry_count, store error_msg
      → if age > task_poll_timeout_seconds: set status='error', msg='task timed out'

   c. collect docs to process:
      - changed_docs filtered to those modified since replica.last_sync_ts
      - UNION sync_map entries for this replica where status='error'
        (capped at 50 per cycle to avoid starving new documents)

   d. for each doc in docs_to_process:
      (all HTTP calls inside respective semaphores, with retry/backoff)

      file_orig     = GET master /api/documents/{id}/download/
      file_archived = GET master /api/documents/{id}/download/?original=false
      meta          = GET master /api/documents/{id}/

      translate metadata using in-memory name→id maps:
        tag_ids       → [replica_tag_id for each master_tag_id]
        correspondent → replica_correspondent_id
        document_type → replica_document_type_id
        custom_fields → {replica_cf_id: value for each master_cf_id}

      if master_doc_id in sync_map[replica] AND replica_doc_id IS NOT NULL:
        PATCH metadata  → replica /api/documents/{replica_doc_id}/
        if sha256(file_orig) != sync_map.file_checksum:
          re-upload original file  → replica
          upload archived file     → replica
        update sync_map (last_synced, file_checksum, status='ok', retry_count reset)
      else if master_doc_id NOT in sync_map[replica]:
        POST file_orig + translated metadata → replica /api/documents/post_document/
        → response: {task_id: "<uuid>"}
        insert sync_map row (status='pending', task_id=<uuid>, replica_doc_id=NULL)
        → task resolution and archived file upload handled in step 5b of next cycle

      log result to logs table (DB + stdout JSON)

   e. on full success for this replica:
      → set replica.last_sync_ts = start of this cycle
      → reset replica.consecutive_failures = 0
      → emit metrics update
      → send alert if docs_failed >= alert_error_threshold and cooldown elapsed

   f. on replica-level failure (unreachable, auth error):
      → increment replica.consecutive_failures
      → if consecutive_failures >= replica_suspend_threshold:
          set replica.suspended_at = now()
          log error "replica suspended after N consecutive failures"
          send alert if cooldown elapsed

6. if all eligible replicas completed without timeout:
   → call sqlite3 backup: db.sqlite3 → db.sqlite3.bak

7. close sync_run record with stats (docs_synced, docs_failed, timed_out)
8. release lock
```

### Conflict resolution

Master always wins. If a document was modified on the replica directly, the master's version overwrites it on the next sync cycle. Replicas should be treated as read-only by convention; there is no enforcement mechanism in v1.

---

## 10. Deployment

```yaml
services:
  pngx-controller:
    image: ghcr.io/yourname/pngx-controller:latest
    restart: unless-stopped
    network_mode: host          # required for Tailscale IP access
    environment:
      SECRET_KEY: ${PNGX_SECRET_KEY}       # Fernet key for encrypting API tokens at rest (required)
      DATABASE_URL: sqlite:////data/db.sqlite3
      MASTER_URL: ${PNGX_MASTER_URL}       # optional: seeds settings.master_url on first boot
      MASTER_TOKEN: ${PNGX_MASTER_TOKEN}   # optional: seeds settings.master_token on first boot
    volumes:
      - /srv/docker/pngx-controller/data:/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8000/healthz"]
      interval: 30s
      timeout: 5s
      retries: 3
      start_period: 10s
```

**Why `network_mode: host`:** The controller makes HTTP requests to Tailscale IPs (100.x.x.x). Inside a bridged Docker network, these are unreachable without additional routing. Host networking gives the container direct access to the host's Tailscale interface. Traefik can still proxy to `localhost:8000` on the host.

**Unauthenticated routes (`/healthz`, `/metrics`):** Configure a second Traefik router for these paths without the `authentik@file` middleware. Both paths are read-only and expose no user data.

**SECRET_KEY rotation:** If `SECRET_KEY` must be replaced, run the bundled CLI command before restarting with the new key:
```
docker run --rm -v /srv/docker/pngx-controller/data:/data \
  -e OLD_SECRET_KEY=<old> -e NEW_SECRET_KEY=<new> \
  ghcr.io/yourname/pngx-controller:latest rotate-key
```
This decrypts all stored tokens with the old key and re-encrypts them with the new key in a single transaction. The container must be stopped before running this command.

`SECRET_KEY` is the only required env var at startup. `MASTER_URL` / `MASTER_TOKEN` are optional conveniences — if omitted, they are entered through the Settings UI on first run. All credentials are stored Fernet-encrypted in SQLite.

### Directory structure

```
/srv/docker/pngx-controller/
└── data/
    ├── db.sqlite3
    └── db.sqlite3.bak     # written after each successful sync run
```

---

## 11. Implementation Phases

### Phase 1 — Working sync (MVP)

- Startup validation: check `SECRET_KEY` validity and DB writability; exit with error if either fails
- Startup cleanup: close orphaned `sync_runs` left by unclean shutdown
- SQLite schema + SQLModel models; enable WAL mode on startup
- Env var seeding: populate `settings` from `MASTER_URL` / `MASTER_TOKEN` on first boot if not set
- Settings page: configure master URL + token (with connection test on save), sync interval, timeouts, suspend threshold, max concurrent requests
- Replica CRUD with per-replica sync interval override; connection test on add/edit (`POST /api/replicas/{id}/test`)
- Reconcile mode: `POST /api/replicas/{id}/reconcile`; UI button appears on replica add if replica has existing documents
- Sync engine:
  - Paginated master document query
  - In-memory name→id mapping; schema parity
  - `asyncio.Semaphore` per target instance (`max_concurrent_requests`)
  - Document push (original + archived files) with retry/backoff (3 attempts, 2/4/8 s)
  - Error-status document retry (up to 50 per cycle per replica)
  - Async task polling with `task_poll_timeout_seconds`
  - Sync cycle timeout (`sync_cycle_timeout_seconds`)
  - Auto-suspend after `replica_suspend_threshold` consecutive failures
  - Per-replica interval check inside global scheduler job
- APScheduler integration with `asyncio.Lock`
- Structured JSON logs to stdout on every sync event
- Basic dashboard: last sync time, per-replica status badge, error rate (N synced · N failed)
- `/api/sync/running` returns progress detail (`phase`, `docs_done`, `docs_total`)
- Log table view (paginated, filterable, FTS search)
- `/healthz` endpoint (unauthenticated)
- `rotate-key` CLI command

### Phase 2 — Live feedback and observability

- SSE log stream on `/api/logs/stream` with HTMX `hx-ext="sse"` integration
- Sync progress indicator on dashboard (HTMX polls `/api/sync/running`, displays phase + count)
- Per-replica document count + lag calculation
- Live feedback on manual sync trigger
- Sync run history on replica detail page (last 20 runs: timestamp, duration, docs synced/failed)
- `/metrics` Prometheus endpoint (unauthenticated)
- SQLite backup to `db.sqlite3.bak` after each successful sync run
- `POST /api/replicas/{id}/unsuspend` + Re-enable UI button
- Alert / notification: Gotify and generic webhook support with configurable threshold and cooldown

### Phase 3 — Resilience and operations

- Full resync per replica (wipe sync_map, rebuild from scratch) — UI button enabled
- File checksum comparison to skip unchanged file re-uploads (`file_checksum` column already exists in Phase 1 schema)
- Deletion propagation via tombstone table (or remain strictly additive — decision deferred)
- Export sync_map as CSV for debugging

---

## 12. Open Questions

1. **Deletion propagation** — resolved for v1: replicas are strictly additive. Revisit in Phase 3: options are tombstone tracking (propagate deletes) or leave as-is (backup semantics, never delete).

2. **File versions** — resolved: both original and archived files are synced. Replicas are HA and must serve users the same experience as the master (archived/OCR'd version is what users download by default).

3. **Replica read access** — resolved: replicas are fully user-facing HA instances with Traefik + Authentik exposure. They are not backup-only.

4. **Sync webhooks** — paperless-ngx supports outgoing webhooks on document events. Phase 3+ could use webhook-triggered sync for near-real-time replication. **Constraint:** the webhook receiver on the controller would need an unauthenticated route (Authentik forward auth blocks unauthenticated POSTs), requiring a separate `/webhook/paperless` route excluded from the Authentik middleware — evaluate security implications before implementing.