feat: add scheduled scans (cron-like recurring scans)

- New `scheduled_scans` table with daily/weekly/monthly frequencies - asyncio background scheduler loop checks for due schedules every 60s - 6 REST endpoints: CRUD + toggle enabled + run-now - `scheduled_scan_id` FK added to scans table; migrated automatically - Frontend: Schedules page (list + create form), Schedules nav link, "Scheduled" badge on ScanDetails when scan was triggered by a schedule Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 10:48:43 +01:00
parent ef5a27097d
commit 836c8474eb
9 changed files with 1666 additions and 10 deletions
--- a/flight-comparator/PRD_SCHEDULED_SCANS.md
+++ b/flight-comparator/PRD_SCHEDULED_SCANS.md
@@ -0,0 +1,409 @@
+# PRD: Scheduled Scans
+
+**Status:** Draft
+**Date:** 2026-02-27
+**Verdict:** Fully feasible — no new dependencies required
+
+---
+
+## 1. Problem
+
+Every scan is triggered manually. If you want to track prices for a route over time (e.g. BDS → Germany every Monday) you have to remember to click "Re-run" yourself. Price trends are only discoverable by comparing scan history manually.
+
+---
+
+## 2. Goal
+
+Let users define a recurring schedule for any scan configuration. The server runs the scan automatically at the defined cadence, building a historical record of price data over time.
+
+---
+
+## 3. User Stories
+
+- **As a user**, I want to schedule a weekly scan of BDS → Germany so I can see how prices change without manually re-running it.
+- **As a user**, I want to enable/disable a schedule without deleting it.
+- **As a user**, I want to see which scans were created by a schedule and navigate to that schedule from a scan.
+- **As a user**, I want to trigger a scheduled scan immediately without waiting for the next interval.
+
+---
+
+## 4. Scheduling Options
+
+Three frequencies are sufficient for flight price tracking:
+
+| Frequency | Parameters | Example |
+|-----------|-----------|---------|
+| `daily`   | hour, minute | Every day at 06:00 |
+| `weekly`  | day_of_week (0=Mon–6=Sun), hour, minute | Every Monday at 06:00 |
+| `monthly` | day_of_month (1–28), hour, minute | 1st of every month at 06:00 |
+
+Day of month capped at 28 to avoid Feb 29/30/31 edge cases. All times stored and executed in UTC.
+
+---
+
+## 5. Architecture
+
+### 5.1 Scheduler Design
+
+No new dependencies. A simple asyncio background task wakes every 60 seconds, queries the DB for due schedules, and fires a scan for each.
+
+```
+lifespan startup
+    └── asyncio.create_task(_scheduler_loop())
+            └── while True:
+                    _check_and_run_due_schedules()   # queries DB
+                    await asyncio.sleep(60)
+```
+
+`_check_and_run_due_schedules()`:
+1. `SELECT * FROM scheduled_scans WHERE enabled=1 AND next_run_at <= NOW()`
+2. For each result, skip if previous scan for this schedule is still `pending` or `running`
+3. Create a new scan row (same INSERT as `POST /scans`)
+4. Call `start_scan_processor(scan_id)`
+5. Update `last_run_at = NOW()` and compute + store `next_run_at`
+
+### 5.2 `next_run_at` Computation
+
+Precomputed in Python after every run (and on create/update). Stored as a TIMESTAMP column with an index — scheduler lookup is a single indexed range query.
+
+```python
+def compute_next_run(frequency, hour, minute,
+                     day_of_week=None, day_of_month=None,
+                     after=None) -> datetime:
+    now = after or datetime.utcnow()
+    base = now.replace(hour=hour, minute=minute, second=0, microsecond=0)
+
+    if frequency == 'daily':
+        return base if base > now else base + timedelta(days=1)
+
+    elif frequency == 'weekly':
+        days_ahead = (day_of_week - now.weekday()) % 7
+        if days_ahead == 0 and base <= now:
+            days_ahead = 7
+        return (now + timedelta(days=days_ahead)).replace(
+            hour=hour, minute=minute, second=0, microsecond=0)
+
+    elif frequency == 'monthly':
+        candidate = now.replace(day=day_of_month, hour=hour, minute=minute, second=0, microsecond=0)
+        if candidate <= now:
+            m, y = (now.month % 12) + 1, now.year + (1 if now.month == 12 else 0)
+            candidate = candidate.replace(year=y, month=m)
+        return candidate
+```
+
+---
+
+## 6. Schema Changes
+
+### 6.1 New table: `scheduled_scans`
+
+```sql
+CREATE TABLE IF NOT EXISTS scheduled_scans (
+    id              INTEGER PRIMARY KEY AUTOINCREMENT,
+
+    -- Scan parameters
+    origin          TEXT NOT NULL CHECK(length(origin) = 3),
+    country         TEXT NOT NULL CHECK(length(country) >= 2),
+    window_months   INTEGER NOT NULL DEFAULT 1
+                        CHECK(window_months >= 1 AND window_months <= 12),
+    seat_class      TEXT NOT NULL DEFAULT 'economy',
+    adults          INTEGER NOT NULL DEFAULT 1
+                        CHECK(adults > 0 AND adults <= 9),
+
+    -- Schedule definition
+    frequency       TEXT NOT NULL
+                        CHECK(frequency IN ('daily', 'weekly', 'monthly')),
+    hour            INTEGER NOT NULL DEFAULT 6
+                        CHECK(hour >= 0 AND hour <= 23),
+    minute          INTEGER NOT NULL DEFAULT 0
+                        CHECK(minute >= 0 AND minute <= 59),
+    day_of_week     INTEGER CHECK(day_of_week >= 0 AND day_of_week <= 6),
+    day_of_month    INTEGER CHECK(day_of_month >= 1 AND day_of_month <= 28),
+
+    -- State
+    enabled         INTEGER NOT NULL DEFAULT 1,
+    label           TEXT,
+    last_run_at     TIMESTAMP,
+    next_run_at     TIMESTAMP NOT NULL,
+
+    created_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    updated_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
+
+    -- Frequency-specific constraints
+    CHECK(
+        (frequency = 'weekly'  AND day_of_week  IS NOT NULL) OR
+        (frequency = 'monthly' AND day_of_month IS NOT NULL) OR
+        (frequency = 'daily')
+    )
+);
+
+-- Fast lookup of due schedules
+CREATE UNIQUE INDEX IF NOT EXISTS uq_scheduled_scans_id
+    ON scheduled_scans(id);
+
+CREATE INDEX IF NOT EXISTS idx_scheduled_scans_next_run
+    ON scheduled_scans(next_run_at)
+    WHERE enabled = 1;
+
+-- Auto-update updated_at
+CREATE TRIGGER IF NOT EXISTS update_scheduled_scans_timestamp
+AFTER UPDATE ON scheduled_scans
+FOR EACH ROW BEGIN
+    UPDATE scheduled_scans SET updated_at = CURRENT_TIMESTAMP WHERE id = NEW.id;
+END;
+
+-- Insert schema version bump
+INSERT OR IGNORE INTO schema_version (version, description)
+VALUES (2, 'Add scheduled_scans table');
+```
+
+### 6.2 Add FK column to `scans`
+
+```sql
+-- Migration: add scheduled_scan_id to scans
+ALTER TABLE scans ADD COLUMN scheduled_scan_id INTEGER
+    REFERENCES scheduled_scans(id) ON DELETE SET NULL;
+
+CREATE INDEX IF NOT EXISTS idx_scans_scheduled_scan_id
+    ON scans(scheduled_scan_id)
+    WHERE scheduled_scan_id IS NOT NULL;
+```
+
+---
+
+## 7. Migration (`database/init_db.py`)
+
+Add two migration functions, called before `executescript(schema_sql)`:
+
+```python
+def _migrate_add_scheduled_scans(conn, verbose=True):
+    """Migration: create scheduled_scans table and add FK to scans."""
+    cursor = conn.execute(
+        "SELECT name FROM sqlite_master WHERE type='table' AND name='scheduled_scans'"
+    )
+    if cursor.fetchone():
+        return  # Already exists
+
+    if verbose:
+        print("   🔄 Migrating: adding scheduled_scans table...")
+
+    conn.execute("""
+        CREATE TABLE scheduled_scans (
+            id INTEGER PRIMARY KEY AUTOINCREMENT, ...
+        )
+    """)
+
+    # Add scheduled_scan_id to existing scans table
+    try:
+        conn.execute("ALTER TABLE scans ADD COLUMN scheduled_scan_id INTEGER REFERENCES scheduled_scans(id) ON DELETE SET NULL")
+    except sqlite3.OperationalError:
+        pass  # Column already exists
+
+    conn.execute("CREATE INDEX IF NOT EXISTS idx_scans_scheduled_scan_id ON scans(scheduled_scan_id) WHERE scheduled_scan_id IS NOT NULL")
+    conn.commit()
+
+    if verbose:
+        print("   ✅ Migration complete: scheduled_scans table created")
+```
+
+---
+
+## 8. API Endpoints
+
+All under `/api/v1/schedules`. Rate limit: 30 req/min per IP (same as scans list).
+
+| Method | Path | Description |
+|--------|------|-------------|
+| `GET`  | `/schedules` | List all schedules (paginated) |
+| `POST` | `/schedules` | Create a schedule |
+| `GET`  | `/schedules/{id}` | Schedule details + last 5 scan IDs |
+| `PATCH` | `/schedules/{id}` | Update (enable/disable, change frequency/params) |
+| `DELETE` | `/schedules/{id}` | Delete schedule (scans are kept, FK set to NULL) |
+| `POST` | `/schedules/{id}/run-now` | Trigger immediately (ignores next_run_at) |
+
+### Request model: `CreateScheduleRequest`
+
+```python
+class CreateScheduleRequest(BaseModel):
+    origin: str                          # 3-char IATA
+    country: Optional[str]               # 2-letter ISO country code
+    destinations: Optional[List[str]]    # Alternative: list of IATA codes
+    window_months: int = 1               # Weeks of data per scan run
+    seat_class: str = 'economy'
+    adults: int = 1
+    label: Optional[str]                 # Human-readable name
+    frequency: str                       # 'daily' | 'weekly' | 'monthly'
+    hour: int = 6                        # UTC hour (0–23)
+    minute: int = 0                      # UTC minute (0–59)
+    day_of_week: Optional[int]           # Required when frequency='weekly' (0=Mon)
+    day_of_month: Optional[int]          # Required when frequency='monthly' (1–28)
+```
+
+### Response model: `Schedule`
+
+```python
+class Schedule(BaseModel):
+    id: int
+    origin: str
+    country: str
+    window_months: int
+    seat_class: str
+    adults: int
+    label: Optional[str]
+    frequency: str
+    hour: int
+    minute: int
+    day_of_week: Optional[int]
+    day_of_month: Optional[int]
+    enabled: bool
+    last_run_at: Optional[str]
+    next_run_at: str
+    created_at: str
+    recent_scan_ids: List[int]           # Last 5 scans created by this schedule
+```
+
+---
+
+## 9. Scheduler Lifecycle (`api_server.py`)
+
+### 9.1 Startup
+
+In the existing `lifespan()` context manager, after existing startup code:
+
+```python
+scheduler_task = asyncio.create_task(_scheduler_loop())
+logger.info("Scheduled scan background task started")
+yield
+scheduler_task.cancel()
+try:
+    await scheduler_task
+except asyncio.CancelledError:
+    pass
+```
+
+### 9.2 Missed runs on restart
+
+When the server starts, `_check_and_run_due_schedules()` fires immediately (before the 60-second sleep), catching any schedules that were due while the server was down. Each overdue schedule runs exactly once — `next_run_at` is then advanced to the next future interval. Multiple missed intervals are not caught up.
+
+### 9.3 Concurrency guard
+
+Before firing a scan for a schedule, check:
+
+```python
+running = conn.execute("""
+    SELECT id FROM scans
+    WHERE scheduled_scan_id = ? AND status IN ('pending', 'running')
+""", (schedule_id,)).fetchone()
+
+if running:
+    logger.info(f"Schedule {schedule_id}: previous scan {running[0]} still active, skipping this run")
+    # Still advance next_run_at so we try again next interval
+    continue
+```
+
+---
+
+## 10. Frontend Changes
+
+### 10.1 New page: `Schedules.tsx`
+
+**List view:**
+- Table of all schedules: label, origin → country, frequency, next run (local time), last run, enabled toggle
+- "New Schedule" button opens create form (same airport search component as Scans)
+- Inline enable/disable toggle (PATCH request, optimistic update)
+- "Run now" button per row
+
+**Create form fields (below existing scan form fields):**
+- Frequency selector: Daily / Weekly / Monthly (segmented button)
+- Time of day: hour:minute picker (UTC, with note)
+- Day of week (shown only for Weekly): Mon–Sun selector
+- Day of month (shown only for Monthly): 1–28 number input
+- Optional label field
+
+### 10.2 Modified: `ScanDetails.tsx`
+
+When a scan has `scheduled_scan_id`, show a small "Scheduled" chip in the header with a link to `/schedules/{scheduled_scan_id}`.
+
+### 10.3 Navigation (`Layout.tsx`)
+
+Add "Schedules" link to sidebar between Scans and Airports.
+
+### 10.4 API client (`api.ts`)
+
+```typescript
+export interface Schedule {
+  id: number;
+  origin: string;
+  country: string;
+  window_months: number;
+  seat_class: string;
+  adults: number;
+  label?: string;
+  frequency: 'daily' | 'weekly' | 'monthly';
+  hour: number;
+  minute: number;
+  day_of_week?: number;
+  day_of_month?: number;
+  enabled: boolean;
+  last_run_at?: string;
+  next_run_at: string;
+  created_at: string;
+  recent_scan_ids: number[];
+}
+
+export const scheduleApi = {
+  list: (page = 1, limit = 20) =>
+    api.get<PaginatedResponse<Schedule>>('/schedules', { params: { page, limit } }),
+  get: (id: number) =>
+    api.get<Schedule>(`/schedules/${id}`),
+  create: (data: CreateScheduleRequest) =>
+    api.post<Schedule>('/schedules', data),
+  update: (id: number, data: Partial<CreateScheduleRequest> & { enabled?: boolean }) =>
+    api.patch<Schedule>(`/schedules/${id}`, data),
+  delete: (id: number) =>
+    api.delete(`/schedules/${id}`),
+  runNow: (id: number) =>
+    api.post<{ scan_id: number }>(`/schedules/${id}/run-now`),
+};
+```
+
+---
+
+## 11. Edge Cases
+
+| Case | Handling |
+|------|----------|
+| Previous scan still running at next interval | Skip this interval's run, advance `next_run_at`, log warning |
+| Server down when schedule is due | On startup, runs any overdue schedule once; does not catch up multiple missed intervals |
+| Schedule deleted while scan is running | `ON DELETE SET NULL` on FK — scan continues, `scheduled_scan_id` becomes NULL |
+| `window_months` covers past dates | Scan start date is always "tomorrow" at creation time, same as manual scans |
+| Monthly with day_of_month=29..31 | Capped at 28 in validation — avoids invalid dates in all months |
+| Simultaneous due schedules | Each creates an independent asyncio task; existing `max_workers=3` semaphore in scan_processor limits total API concurrency across all running scans |
+| Schedule created at 05:59, fires at 06:00 UTC | `next_run_at` is computed at creation time — if 06:00 today already passed, fires tomorrow |
+
+---
+
+## 12. Files Changed
+
+| File | Change |
+|------|--------|
+| `database/schema.sql` | Add `scheduled_scans` table, trigger, indexes, schema_version bump |
+| `database/init_db.py` | `_migrate_add_scheduled_scans()` + call in `initialize_database()` |
+| `api_server.py` | `compute_next_run()`, `_scheduler_loop()`, `_check_and_run_due_schedules()`, 6 new endpoints, lifespan update, new Pydantic models |
+| `frontend/src/api.ts` | `Schedule` type, `CreateScheduleRequest` type, `scheduleApi` object |
+| `frontend/src/pages/Schedules.tsx` | New page (list + inline create form) |
+| `frontend/src/pages/ScanDetails.tsx` | "Scheduled" badge + link when `scheduled_scan_id` present |
+| `frontend/src/components/Layout.tsx` | Schedules nav link |
+
+Total: 7 files. Estimated ~500 new lines (backend ~250, frontend ~250).
+
+---
+
+## 13. Out of Scope
+
+- Notifications / alerts when a scheduled scan completes (email, webhook)
+- Per-schedule price change detection / diffing between runs
+- Timezone-aware scheduling (all times UTC for now)
+- Pause/resume of scheduled scans (separate PRD)
+- Rate limiting across simultaneous scheduled scans (existing semaphore provides soft protection)
+- Dashboard widgets for upcoming scheduled runs