Files
ciaovolo/flight-comparator/PRD_SCHEDULED_SCANS.md
domverse 836c8474eb feat: add scheduled scans (cron-like recurring scans)
- New `scheduled_scans` table with daily/weekly/monthly frequencies
- asyncio background scheduler loop checks for due schedules every 60s
- 6 REST endpoints: CRUD + toggle enabled + run-now
- `scheduled_scan_id` FK added to scans table; migrated automatically
- Frontend: Schedules page (list + create form), Schedules nav link,
  "Scheduled" badge on ScanDetails when scan was triggered by a schedule

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 10:48:43 +01:00

14 KiB
Raw Blame History

PRD: Scheduled Scans

Status: Draft Date: 2026-02-27 Verdict: Fully feasible — no new dependencies required


1. Problem

Every scan is triggered manually. If you want to track prices for a route over time (e.g. BDS → Germany every Monday) you have to remember to click "Re-run" yourself. Price trends are only discoverable by comparing scan history manually.


2. Goal

Let users define a recurring schedule for any scan configuration. The server runs the scan automatically at the defined cadence, building a historical record of price data over time.


3. User Stories

  • As a user, I want to schedule a weekly scan of BDS → Germany so I can see how prices change without manually re-running it.
  • As a user, I want to enable/disable a schedule without deleting it.
  • As a user, I want to see which scans were created by a schedule and navigate to that schedule from a scan.
  • As a user, I want to trigger a scheduled scan immediately without waiting for the next interval.

4. Scheduling Options

Three frequencies are sufficient for flight price tracking:

Frequency Parameters Example
daily hour, minute Every day at 06:00
weekly day_of_week (0=Mon6=Sun), hour, minute Every Monday at 06:00
monthly day_of_month (128), hour, minute 1st of every month at 06:00

Day of month capped at 28 to avoid Feb 29/30/31 edge cases. All times stored and executed in UTC.


5. Architecture

5.1 Scheduler Design

No new dependencies. A simple asyncio background task wakes every 60 seconds, queries the DB for due schedules, and fires a scan for each.

lifespan startup
    └── asyncio.create_task(_scheduler_loop())
            └── while True:
                    _check_and_run_due_schedules()   # queries DB
                    await asyncio.sleep(60)

_check_and_run_due_schedules():

  1. SELECT * FROM scheduled_scans WHERE enabled=1 AND next_run_at <= NOW()
  2. For each result, skip if previous scan for this schedule is still pending or running
  3. Create a new scan row (same INSERT as POST /scans)
  4. Call start_scan_processor(scan_id)
  5. Update last_run_at = NOW() and compute + store next_run_at

5.2 next_run_at Computation

Precomputed in Python after every run (and on create/update). Stored as a TIMESTAMP column with an index — scheduler lookup is a single indexed range query.

def compute_next_run(frequency, hour, minute,
                     day_of_week=None, day_of_month=None,
                     after=None) -> datetime:
    now = after or datetime.utcnow()
    base = now.replace(hour=hour, minute=minute, second=0, microsecond=0)

    if frequency == 'daily':
        return base if base > now else base + timedelta(days=1)

    elif frequency == 'weekly':
        days_ahead = (day_of_week - now.weekday()) % 7
        if days_ahead == 0 and base <= now:
            days_ahead = 7
        return (now + timedelta(days=days_ahead)).replace(
            hour=hour, minute=minute, second=0, microsecond=0)

    elif frequency == 'monthly':
        candidate = now.replace(day=day_of_month, hour=hour, minute=minute, second=0, microsecond=0)
        if candidate <= now:
            m, y = (now.month % 12) + 1, now.year + (1 if now.month == 12 else 0)
            candidate = candidate.replace(year=y, month=m)
        return candidate

6. Schema Changes

6.1 New table: scheduled_scans

CREATE TABLE IF NOT EXISTS scheduled_scans (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,

    -- Scan parameters
    origin          TEXT NOT NULL CHECK(length(origin) = 3),
    country         TEXT NOT NULL CHECK(length(country) >= 2),
    window_months   INTEGER NOT NULL DEFAULT 1
                        CHECK(window_months >= 1 AND window_months <= 12),
    seat_class      TEXT NOT NULL DEFAULT 'economy',
    adults          INTEGER NOT NULL DEFAULT 1
                        CHECK(adults > 0 AND adults <= 9),

    -- Schedule definition
    frequency       TEXT NOT NULL
                        CHECK(frequency IN ('daily', 'weekly', 'monthly')),
    hour            INTEGER NOT NULL DEFAULT 6
                        CHECK(hour >= 0 AND hour <= 23),
    minute          INTEGER NOT NULL DEFAULT 0
                        CHECK(minute >= 0 AND minute <= 59),
    day_of_week     INTEGER CHECK(day_of_week >= 0 AND day_of_week <= 6),
    day_of_month    INTEGER CHECK(day_of_month >= 1 AND day_of_month <= 28),

    -- State
    enabled         INTEGER NOT NULL DEFAULT 1,
    label           TEXT,
    last_run_at     TIMESTAMP,
    next_run_at     TIMESTAMP NOT NULL,

    created_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Frequency-specific constraints
    CHECK(
        (frequency = 'weekly'  AND day_of_week  IS NOT NULL) OR
        (frequency = 'monthly' AND day_of_month IS NOT NULL) OR
        (frequency = 'daily')
    )
);

-- Fast lookup of due schedules
CREATE UNIQUE INDEX IF NOT EXISTS uq_scheduled_scans_id
    ON scheduled_scans(id);

CREATE INDEX IF NOT EXISTS idx_scheduled_scans_next_run
    ON scheduled_scans(next_run_at)
    WHERE enabled = 1;

-- Auto-update updated_at
CREATE TRIGGER IF NOT EXISTS update_scheduled_scans_timestamp
AFTER UPDATE ON scheduled_scans
FOR EACH ROW BEGIN
    UPDATE scheduled_scans SET updated_at = CURRENT_TIMESTAMP WHERE id = NEW.id;
END;

-- Insert schema version bump
INSERT OR IGNORE INTO schema_version (version, description)
VALUES (2, 'Add scheduled_scans table');

6.2 Add FK column to scans

-- Migration: add scheduled_scan_id to scans
ALTER TABLE scans ADD COLUMN scheduled_scan_id INTEGER
    REFERENCES scheduled_scans(id) ON DELETE SET NULL;

CREATE INDEX IF NOT EXISTS idx_scans_scheduled_scan_id
    ON scans(scheduled_scan_id)
    WHERE scheduled_scan_id IS NOT NULL;

7. Migration (database/init_db.py)

Add two migration functions, called before executescript(schema_sql):

def _migrate_add_scheduled_scans(conn, verbose=True):
    """Migration: create scheduled_scans table and add FK to scans."""
    cursor = conn.execute(
        "SELECT name FROM sqlite_master WHERE type='table' AND name='scheduled_scans'"
    )
    if cursor.fetchone():
        return  # Already exists

    if verbose:
        print("   🔄 Migrating: adding scheduled_scans table...")

    conn.execute("""
        CREATE TABLE scheduled_scans (
            id INTEGER PRIMARY KEY AUTOINCREMENT, ...
        )
    """)

    # Add scheduled_scan_id to existing scans table
    try:
        conn.execute("ALTER TABLE scans ADD COLUMN scheduled_scan_id INTEGER REFERENCES scheduled_scans(id) ON DELETE SET NULL")
    except sqlite3.OperationalError:
        pass  # Column already exists

    conn.execute("CREATE INDEX IF NOT EXISTS idx_scans_scheduled_scan_id ON scans(scheduled_scan_id) WHERE scheduled_scan_id IS NOT NULL")
    conn.commit()

    if verbose:
        print("   ✅ Migration complete: scheduled_scans table created")

8. API Endpoints

All under /api/v1/schedules. Rate limit: 30 req/min per IP (same as scans list).

Method Path Description
GET /schedules List all schedules (paginated)
POST /schedules Create a schedule
GET /schedules/{id} Schedule details + last 5 scan IDs
PATCH /schedules/{id} Update (enable/disable, change frequency/params)
DELETE /schedules/{id} Delete schedule (scans are kept, FK set to NULL)
POST /schedules/{id}/run-now Trigger immediately (ignores next_run_at)

Request model: CreateScheduleRequest

class CreateScheduleRequest(BaseModel):
    origin: str                          # 3-char IATA
    country: Optional[str]               # 2-letter ISO country code
    destinations: Optional[List[str]]    # Alternative: list of IATA codes
    window_months: int = 1               # Weeks of data per scan run
    seat_class: str = 'economy'
    adults: int = 1
    label: Optional[str]                 # Human-readable name
    frequency: str                       # 'daily' | 'weekly' | 'monthly'
    hour: int = 6                        # UTC hour (023)
    minute: int = 0                      # UTC minute (059)
    day_of_week: Optional[int]           # Required when frequency='weekly' (0=Mon)
    day_of_month: Optional[int]          # Required when frequency='monthly' (128)

Response model: Schedule

class Schedule(BaseModel):
    id: int
    origin: str
    country: str
    window_months: int
    seat_class: str
    adults: int
    label: Optional[str]
    frequency: str
    hour: int
    minute: int
    day_of_week: Optional[int]
    day_of_month: Optional[int]
    enabled: bool
    last_run_at: Optional[str]
    next_run_at: str
    created_at: str
    recent_scan_ids: List[int]           # Last 5 scans created by this schedule

9. Scheduler Lifecycle (api_server.py)

9.1 Startup

In the existing lifespan() context manager, after existing startup code:

scheduler_task = asyncio.create_task(_scheduler_loop())
logger.info("Scheduled scan background task started")
yield
scheduler_task.cancel()
try:
    await scheduler_task
except asyncio.CancelledError:
    pass

9.2 Missed runs on restart

When the server starts, _check_and_run_due_schedules() fires immediately (before the 60-second sleep), catching any schedules that were due while the server was down. Each overdue schedule runs exactly once — next_run_at is then advanced to the next future interval. Multiple missed intervals are not caught up.

9.3 Concurrency guard

Before firing a scan for a schedule, check:

running = conn.execute("""
    SELECT id FROM scans
    WHERE scheduled_scan_id = ? AND status IN ('pending', 'running')
""", (schedule_id,)).fetchone()

if running:
    logger.info(f"Schedule {schedule_id}: previous scan {running[0]} still active, skipping this run")
    # Still advance next_run_at so we try again next interval
    continue

10. Frontend Changes

10.1 New page: Schedules.tsx

List view:

  • Table of all schedules: label, origin → country, frequency, next run (local time), last run, enabled toggle
  • "New Schedule" button opens create form (same airport search component as Scans)
  • Inline enable/disable toggle (PATCH request, optimistic update)
  • "Run now" button per row

Create form fields (below existing scan form fields):

  • Frequency selector: Daily / Weekly / Monthly (segmented button)
  • Time of day: hour:minute picker (UTC, with note)
  • Day of week (shown only for Weekly): MonSun selector
  • Day of month (shown only for Monthly): 128 number input
  • Optional label field

10.2 Modified: ScanDetails.tsx

When a scan has scheduled_scan_id, show a small "Scheduled" chip in the header with a link to /schedules/{scheduled_scan_id}.

10.3 Navigation (Layout.tsx)

Add "Schedules" link to sidebar between Scans and Airports.

10.4 API client (api.ts)

export interface Schedule {
  id: number;
  origin: string;
  country: string;
  window_months: number;
  seat_class: string;
  adults: number;
  label?: string;
  frequency: 'daily' | 'weekly' | 'monthly';
  hour: number;
  minute: number;
  day_of_week?: number;
  day_of_month?: number;
  enabled: boolean;
  last_run_at?: string;
  next_run_at: string;
  created_at: string;
  recent_scan_ids: number[];
}

export const scheduleApi = {
  list: (page = 1, limit = 20) =>
    api.get<PaginatedResponse<Schedule>>('/schedules', { params: { page, limit } }),
  get: (id: number) =>
    api.get<Schedule>(`/schedules/${id}`),
  create: (data: CreateScheduleRequest) =>
    api.post<Schedule>('/schedules', data),
  update: (id: number, data: Partial<CreateScheduleRequest> & { enabled?: boolean }) =>
    api.patch<Schedule>(`/schedules/${id}`, data),
  delete: (id: number) =>
    api.delete(`/schedules/${id}`),
  runNow: (id: number) =>
    api.post<{ scan_id: number }>(`/schedules/${id}/run-now`),
};

11. Edge Cases

Case Handling
Previous scan still running at next interval Skip this interval's run, advance next_run_at, log warning
Server down when schedule is due On startup, runs any overdue schedule once; does not catch up multiple missed intervals
Schedule deleted while scan is running ON DELETE SET NULL on FK — scan continues, scheduled_scan_id becomes NULL
window_months covers past dates Scan start date is always "tomorrow" at creation time, same as manual scans
Monthly with day_of_month=29..31 Capped at 28 in validation — avoids invalid dates in all months
Simultaneous due schedules Each creates an independent asyncio task; existing max_workers=3 semaphore in scan_processor limits total API concurrency across all running scans
Schedule created at 05:59, fires at 06:00 UTC next_run_at is computed at creation time — if 06:00 today already passed, fires tomorrow

12. Files Changed

File Change
database/schema.sql Add scheduled_scans table, trigger, indexes, schema_version bump
database/init_db.py _migrate_add_scheduled_scans() + call in initialize_database()
api_server.py compute_next_run(), _scheduler_loop(), _check_and_run_due_schedules(), 6 new endpoints, lifespan update, new Pydantic models
frontend/src/api.ts Schedule type, CreateScheduleRequest type, scheduleApi object
frontend/src/pages/Schedules.tsx New page (list + inline create form)
frontend/src/pages/ScanDetails.tsx "Scheduled" badge + link when scheduled_scan_id present
frontend/src/components/Layout.tsx Schedules nav link

Total: 7 files. Estimated ~500 new lines (backend ~250, frontend ~250).


13. Out of Scope

  • Notifications / alerts when a scheduled scan completes (email, webhook)
  • Per-schedule price change detection / diffing between runs
  • Timezone-aware scheduling (all times UTC for now)
  • Pause/resume of scheduled scans (separate PRD)
  • Rate limiting across simultaneous scheduled scans (existing semaphore provides soft protection)
  • Dashboard widgets for upcoming scheduled runs