# PRD: Scheduled Scans **Status:** Draft **Date:** 2026-02-27 **Verdict:** Fully feasible — no new dependencies required --- ## 1. Problem Every scan is triggered manually. If you want to track prices for a route over time (e.g. BDS → Germany every Monday) you have to remember to click "Re-run" yourself. Price trends are only discoverable by comparing scan history manually. --- ## 2. Goal Let users define a recurring schedule for any scan configuration. The server runs the scan automatically at the defined cadence, building a historical record of price data over time. --- ## 3. User Stories - **As a user**, I want to schedule a weekly scan of BDS → Germany so I can see how prices change without manually re-running it. - **As a user**, I want to enable/disable a schedule without deleting it. - **As a user**, I want to see which scans were created by a schedule and navigate to that schedule from a scan. - **As a user**, I want to trigger a scheduled scan immediately without waiting for the next interval. --- ## 4. Scheduling Options Three frequencies are sufficient for flight price tracking: | Frequency | Parameters | Example | |-----------|-----------|---------| | `daily` | hour, minute | Every day at 06:00 | | `weekly` | day_of_week (0=Mon–6=Sun), hour, minute | Every Monday at 06:00 | | `monthly` | day_of_month (1–28), hour, minute | 1st of every month at 06:00 | Day of month capped at 28 to avoid Feb 29/30/31 edge cases. All times stored and executed in UTC. --- ## 5. Architecture ### 5.1 Scheduler Design No new dependencies. A simple asyncio background task wakes every 60 seconds, queries the DB for due schedules, and fires a scan for each. ``` lifespan startup └── asyncio.create_task(_scheduler_loop()) └── while True: _check_and_run_due_schedules() # queries DB await asyncio.sleep(60) ``` `_check_and_run_due_schedules()`: 1. `SELECT * FROM scheduled_scans WHERE enabled=1 AND next_run_at <= NOW()` 2. For each result, skip if previous scan for this schedule is still `pending` or `running` 3. Create a new scan row (same INSERT as `POST /scans`) 4. Call `start_scan_processor(scan_id)` 5. Update `last_run_at = NOW()` and compute + store `next_run_at` ### 5.2 `next_run_at` Computation Precomputed in Python after every run (and on create/update). Stored as a TIMESTAMP column with an index — scheduler lookup is a single indexed range query. ```python def compute_next_run(frequency, hour, minute, day_of_week=None, day_of_month=None, after=None) -> datetime: now = after or datetime.utcnow() base = now.replace(hour=hour, minute=minute, second=0, microsecond=0) if frequency == 'daily': return base if base > now else base + timedelta(days=1) elif frequency == 'weekly': days_ahead = (day_of_week - now.weekday()) % 7 if days_ahead == 0 and base <= now: days_ahead = 7 return (now + timedelta(days=days_ahead)).replace( hour=hour, minute=minute, second=0, microsecond=0) elif frequency == 'monthly': candidate = now.replace(day=day_of_month, hour=hour, minute=minute, second=0, microsecond=0) if candidate <= now: m, y = (now.month % 12) + 1, now.year + (1 if now.month == 12 else 0) candidate = candidate.replace(year=y, month=m) return candidate ``` --- ## 6. Schema Changes ### 6.1 New table: `scheduled_scans` ```sql CREATE TABLE IF NOT EXISTS scheduled_scans ( id INTEGER PRIMARY KEY AUTOINCREMENT, -- Scan parameters origin TEXT NOT NULL CHECK(length(origin) = 3), country TEXT NOT NULL CHECK(length(country) >= 2), window_months INTEGER NOT NULL DEFAULT 1 CHECK(window_months >= 1 AND window_months <= 12), seat_class TEXT NOT NULL DEFAULT 'economy', adults INTEGER NOT NULL DEFAULT 1 CHECK(adults > 0 AND adults <= 9), -- Schedule definition frequency TEXT NOT NULL CHECK(frequency IN ('daily', 'weekly', 'monthly')), hour INTEGER NOT NULL DEFAULT 6 CHECK(hour >= 0 AND hour <= 23), minute INTEGER NOT NULL DEFAULT 0 CHECK(minute >= 0 AND minute <= 59), day_of_week INTEGER CHECK(day_of_week >= 0 AND day_of_week <= 6), day_of_month INTEGER CHECK(day_of_month >= 1 AND day_of_month <= 28), -- State enabled INTEGER NOT NULL DEFAULT 1, label TEXT, last_run_at TIMESTAMP, next_run_at TIMESTAMP NOT NULL, created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, -- Frequency-specific constraints CHECK( (frequency = 'weekly' AND day_of_week IS NOT NULL) OR (frequency = 'monthly' AND day_of_month IS NOT NULL) OR (frequency = 'daily') ) ); -- Fast lookup of due schedules CREATE UNIQUE INDEX IF NOT EXISTS uq_scheduled_scans_id ON scheduled_scans(id); CREATE INDEX IF NOT EXISTS idx_scheduled_scans_next_run ON scheduled_scans(next_run_at) WHERE enabled = 1; -- Auto-update updated_at CREATE TRIGGER IF NOT EXISTS update_scheduled_scans_timestamp AFTER UPDATE ON scheduled_scans FOR EACH ROW BEGIN UPDATE scheduled_scans SET updated_at = CURRENT_TIMESTAMP WHERE id = NEW.id; END; -- Insert schema version bump INSERT OR IGNORE INTO schema_version (version, description) VALUES (2, 'Add scheduled_scans table'); ``` ### 6.2 Add FK column to `scans` ```sql -- Migration: add scheduled_scan_id to scans ALTER TABLE scans ADD COLUMN scheduled_scan_id INTEGER REFERENCES scheduled_scans(id) ON DELETE SET NULL; CREATE INDEX IF NOT EXISTS idx_scans_scheduled_scan_id ON scans(scheduled_scan_id) WHERE scheduled_scan_id IS NOT NULL; ``` --- ## 7. Migration (`database/init_db.py`) Add two migration functions, called before `executescript(schema_sql)`: ```python def _migrate_add_scheduled_scans(conn, verbose=True): """Migration: create scheduled_scans table and add FK to scans.""" cursor = conn.execute( "SELECT name FROM sqlite_master WHERE type='table' AND name='scheduled_scans'" ) if cursor.fetchone(): return # Already exists if verbose: print(" 🔄 Migrating: adding scheduled_scans table...") conn.execute(""" CREATE TABLE scheduled_scans ( id INTEGER PRIMARY KEY AUTOINCREMENT, ... ) """) # Add scheduled_scan_id to existing scans table try: conn.execute("ALTER TABLE scans ADD COLUMN scheduled_scan_id INTEGER REFERENCES scheduled_scans(id) ON DELETE SET NULL") except sqlite3.OperationalError: pass # Column already exists conn.execute("CREATE INDEX IF NOT EXISTS idx_scans_scheduled_scan_id ON scans(scheduled_scan_id) WHERE scheduled_scan_id IS NOT NULL") conn.commit() if verbose: print(" ✅ Migration complete: scheduled_scans table created") ``` --- ## 8. API Endpoints All under `/api/v1/schedules`. Rate limit: 30 req/min per IP (same as scans list). | Method | Path | Description | |--------|------|-------------| | `GET` | `/schedules` | List all schedules (paginated) | | `POST` | `/schedules` | Create a schedule | | `GET` | `/schedules/{id}` | Schedule details + last 5 scan IDs | | `PATCH` | `/schedules/{id}` | Update (enable/disable, change frequency/params) | | `DELETE` | `/schedules/{id}` | Delete schedule (scans are kept, FK set to NULL) | | `POST` | `/schedules/{id}/run-now` | Trigger immediately (ignores next_run_at) | ### Request model: `CreateScheduleRequest` ```python class CreateScheduleRequest(BaseModel): origin: str # 3-char IATA country: Optional[str] # 2-letter ISO country code destinations: Optional[List[str]] # Alternative: list of IATA codes window_months: int = 1 # Weeks of data per scan run seat_class: str = 'economy' adults: int = 1 label: Optional[str] # Human-readable name frequency: str # 'daily' | 'weekly' | 'monthly' hour: int = 6 # UTC hour (0–23) minute: int = 0 # UTC minute (0–59) day_of_week: Optional[int] # Required when frequency='weekly' (0=Mon) day_of_month: Optional[int] # Required when frequency='monthly' (1–28) ``` ### Response model: `Schedule` ```python class Schedule(BaseModel): id: int origin: str country: str window_months: int seat_class: str adults: int label: Optional[str] frequency: str hour: int minute: int day_of_week: Optional[int] day_of_month: Optional[int] enabled: bool last_run_at: Optional[str] next_run_at: str created_at: str recent_scan_ids: List[int] # Last 5 scans created by this schedule ``` --- ## 9. Scheduler Lifecycle (`api_server.py`) ### 9.1 Startup In the existing `lifespan()` context manager, after existing startup code: ```python scheduler_task = asyncio.create_task(_scheduler_loop()) logger.info("Scheduled scan background task started") yield scheduler_task.cancel() try: await scheduler_task except asyncio.CancelledError: pass ``` ### 9.2 Missed runs on restart When the server starts, `_check_and_run_due_schedules()` fires immediately (before the 60-second sleep), catching any schedules that were due while the server was down. Each overdue schedule runs exactly once — `next_run_at` is then advanced to the next future interval. Multiple missed intervals are not caught up. ### 9.3 Concurrency guard Before firing a scan for a schedule, check: ```python running = conn.execute(""" SELECT id FROM scans WHERE scheduled_scan_id = ? AND status IN ('pending', 'running') """, (schedule_id,)).fetchone() if running: logger.info(f"Schedule {schedule_id}: previous scan {running[0]} still active, skipping this run") # Still advance next_run_at so we try again next interval continue ``` --- ## 10. Frontend Changes ### 10.1 New page: `Schedules.tsx` **List view:** - Table of all schedules: label, origin → country, frequency, next run (local time), last run, enabled toggle - "New Schedule" button opens create form (same airport search component as Scans) - Inline enable/disable toggle (PATCH request, optimistic update) - "Run now" button per row **Create form fields (below existing scan form fields):** - Frequency selector: Daily / Weekly / Monthly (segmented button) - Time of day: hour:minute picker (UTC, with note) - Day of week (shown only for Weekly): Mon–Sun selector - Day of month (shown only for Monthly): 1–28 number input - Optional label field ### 10.2 Modified: `ScanDetails.tsx` When a scan has `scheduled_scan_id`, show a small "Scheduled" chip in the header with a link to `/schedules/{scheduled_scan_id}`. ### 10.3 Navigation (`Layout.tsx`) Add "Schedules" link to sidebar between Scans and Airports. ### 10.4 API client (`api.ts`) ```typescript export interface Schedule { id: number; origin: string; country: string; window_months: number; seat_class: string; adults: number; label?: string; frequency: 'daily' | 'weekly' | 'monthly'; hour: number; minute: number; day_of_week?: number; day_of_month?: number; enabled: boolean; last_run_at?: string; next_run_at: string; created_at: string; recent_scan_ids: number[]; } export const scheduleApi = { list: (page = 1, limit = 20) => api.get>('/schedules', { params: { page, limit } }), get: (id: number) => api.get(`/schedules/${id}`), create: (data: CreateScheduleRequest) => api.post('/schedules', data), update: (id: number, data: Partial & { enabled?: boolean }) => api.patch(`/schedules/${id}`, data), delete: (id: number) => api.delete(`/schedules/${id}`), runNow: (id: number) => api.post<{ scan_id: number }>(`/schedules/${id}/run-now`), }; ``` --- ## 11. Edge Cases | Case | Handling | |------|----------| | Previous scan still running at next interval | Skip this interval's run, advance `next_run_at`, log warning | | Server down when schedule is due | On startup, runs any overdue schedule once; does not catch up multiple missed intervals | | Schedule deleted while scan is running | `ON DELETE SET NULL` on FK — scan continues, `scheduled_scan_id` becomes NULL | | `window_months` covers past dates | Scan start date is always "tomorrow" at creation time, same as manual scans | | Monthly with day_of_month=29..31 | Capped at 28 in validation — avoids invalid dates in all months | | Simultaneous due schedules | Each creates an independent asyncio task; existing `max_workers=3` semaphore in scan_processor limits total API concurrency across all running scans | | Schedule created at 05:59, fires at 06:00 UTC | `next_run_at` is computed at creation time — if 06:00 today already passed, fires tomorrow | --- ## 12. Files Changed | File | Change | |------|--------| | `database/schema.sql` | Add `scheduled_scans` table, trigger, indexes, schema_version bump | | `database/init_db.py` | `_migrate_add_scheduled_scans()` + call in `initialize_database()` | | `api_server.py` | `compute_next_run()`, `_scheduler_loop()`, `_check_and_run_due_schedules()`, 6 new endpoints, lifespan update, new Pydantic models | | `frontend/src/api.ts` | `Schedule` type, `CreateScheduleRequest` type, `scheduleApi` object | | `frontend/src/pages/Schedules.tsx` | New page (list + inline create form) | | `frontend/src/pages/ScanDetails.tsx` | "Scheduled" badge + link when `scheduled_scan_id` present | | `frontend/src/components/Layout.tsx` | Schedules nav link | Total: 7 files. Estimated ~500 new lines (backend ~250, frontend ~250). --- ## 13. Out of Scope - Notifications / alerts when a scheduled scan completes (email, webhook) - Per-schedule price change detection / diffing between runs - Timezone-aware scheduling (all times UTC for now) - Pause/resume of scheduled scans (separate PRD) - Rate limiting across simultaneous scheduled scans (existing semaphore provides soft protection) - Dashboard widgets for upcoming scheduled runs