Files

domverse 836c8474eb feat: add scheduled scans (cron-like recurring scans)

- New `scheduled_scans` table with daily/weekly/monthly frequencies
- asyncio background scheduler loop checks for due schedules every 60s
- 6 REST endpoints: CRUD + toggle enabled + run-now
- `scheduled_scan_id` FK added to scans table; migrated automatically
- Frontend: Schedules page (list + create form), Schedules nav link,
  "Scheduled" badge on ScanDetails when scan was triggered by a schedule

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

2026-02-28 10:48:43 +01:00

14 KiB

Raw Permalink Blame History

PRD: Scheduled Scans

Status: Draft Date: 2026-02-27 Verdict: Fully feasible — no new dependencies required

1. Problem

Every scan is triggered manually. If you want to track prices for a route over time (e.g. BDS → Germany every Monday) you have to remember to click "Re-run" yourself. Price trends are only discoverable by comparing scan history manually.

2. Goal

Let users define a recurring schedule for any scan configuration. The server runs the scan automatically at the defined cadence, building a historical record of price data over time.

3. User Stories

As a user, I want to schedule a weekly scan of BDS → Germany so I can see how prices change without manually re-running it.
As a user, I want to enable/disable a schedule without deleting it.
As a user, I want to see which scans were created by a schedule and navigate to that schedule from a scan.
As a user, I want to trigger a scheduled scan immediately without waiting for the next interval.

4. Scheduling Options

Three frequencies are sufficient for flight price tracking:

Frequency	Parameters	Example
`daily`	hour, minute	Every day at 06:00
`weekly`	day_of_week (0=Mon–6=Sun), hour, minute	Every Monday at 06:00
`monthly`	day_of_month (1–28), hour, minute	1st of every month at 06:00

Day of month capped at 28 to avoid Feb 29/30/31 edge cases. All times stored and executed in UTC.

5. Architecture

5.1 Scheduler Design

No new dependencies. A simple asyncio background task wakes every 60 seconds, queries the DB for due schedules, and fires a scan for each.

lifespan startup
    └── asyncio.create_task(_scheduler_loop())
            └── while True:
                    _check_and_run_due_schedules()   # queries DB
                    await asyncio.sleep(60)

_check_and_run_due_schedules():

SELECT * FROM scheduled_scans WHERE enabled=1 AND next_run_at <= NOW()
For each result, skip if previous scan for this schedule is still pending or running
Create a new scan row (same INSERT as POST /scans)
Call start_scan_processor(scan_id)
Update last_run_at = NOW() and compute + store next_run_at

5.2 `next_run_at` Computation

Precomputed in Python after every run (and on create/update). Stored as a TIMESTAMP column with an index — scheduler lookup is a single indexed range query.

def compute_next_run(frequency, hour, minute,
                     day_of_week=None, day_of_month=None,
                     after=None) -> datetime:
    now = after or datetime.utcnow()
    base = now.replace(hour=hour, minute=minute, second=0, microsecond=0)

    if frequency == 'daily':
        return base if base > now else base + timedelta(days=1)

    elif frequency == 'weekly':
        days_ahead = (day_of_week - now.weekday()) % 7
        if days_ahead == 0 and base <= now:
            days_ahead = 7
        return (now + timedelta(days=days_ahead)).replace(
            hour=hour, minute=minute, second=0, microsecond=0)

    elif frequency == 'monthly':
        candidate = now.replace(day=day_of_month, hour=hour, minute=minute, second=0, microsecond=0)
        if candidate <= now:
            m, y = (now.month % 12) + 1, now.year + (1 if now.month == 12 else 0)
            candidate = candidate.replace(year=y, month=m)
        return candidate

6. Schema Changes

6.1 New table: `scheduled_scans`

CREATE TABLE IF NOT EXISTS scheduled_scans (
    id              INTEGER PRIMARY KEY AUTOINCREMENT,

    -- Scan parameters
    origin          TEXT NOT NULL CHECK(length(origin) = 3),
    country         TEXT NOT NULL CHECK(length(country) >= 2),
    window_months   INTEGER NOT NULL DEFAULT 1
                        CHECK(window_months >= 1 AND window_months <= 12),
    seat_class      TEXT NOT NULL DEFAULT 'economy',
    adults          INTEGER NOT NULL DEFAULT 1
                        CHECK(adults > 0 AND adults <= 9),

    -- Schedule definition
    frequency       TEXT NOT NULL
                        CHECK(frequency IN ('daily', 'weekly', 'monthly')),
    hour            INTEGER NOT NULL DEFAULT 6
                        CHECK(hour >= 0 AND hour <= 23),
    minute          INTEGER NOT NULL DEFAULT 0
                        CHECK(minute >= 0 AND minute <= 59),
    day_of_week     INTEGER CHECK(day_of_week >= 0 AND day_of_week <= 6),
    day_of_month    INTEGER CHECK(day_of_month >= 1 AND day_of_month <= 28),

    -- State
    enabled         INTEGER NOT NULL DEFAULT 1,
    label           TEXT,
    last_run_at     TIMESTAMP,
    next_run_at     TIMESTAMP NOT NULL,

    created_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at      TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,

    -- Frequency-specific constraints
    CHECK(
        (frequency = 'weekly'  AND day_of_week  IS NOT NULL) OR
        (frequency = 'monthly' AND day_of_month IS NOT NULL) OR
        (frequency = 'daily')
    )
);

-- Fast lookup of due schedules
CREATE UNIQUE INDEX IF NOT EXISTS uq_scheduled_scans_id
    ON scheduled_scans(id);

CREATE INDEX IF NOT EXISTS idx_scheduled_scans_next_run
    ON scheduled_scans(next_run_at)
    WHERE enabled = 1;

-- Auto-update updated_at
CREATE TRIGGER IF NOT EXISTS update_scheduled_scans_timestamp
AFTER UPDATE ON scheduled_scans
FOR EACH ROW BEGIN
    UPDATE scheduled_scans SET updated_at = CURRENT_TIMESTAMP WHERE id = NEW.id;
END;

-- Insert schema version bump
INSERT OR IGNORE INTO schema_version (version, description)
VALUES (2, 'Add scheduled_scans table');

6.2 Add FK column to `scans`

-- Migration: add scheduled_scan_id to scans
ALTER TABLE scans ADD COLUMN scheduled_scan_id INTEGER
    REFERENCES scheduled_scans(id) ON DELETE SET NULL;

CREATE INDEX IF NOT EXISTS idx_scans_scheduled_scan_id
    ON scans(scheduled_scan_id)
    WHERE scheduled_scan_id IS NOT NULL;

7. Migration (`database/init_db.py`)

Add two migration functions, called before executescript(schema_sql):

def _migrate_add_scheduled_scans(conn, verbose=True):
    """Migration: create scheduled_scans table and add FK to scans."""
    cursor = conn.execute(
        "SELECT name FROM sqlite_master WHERE type='table' AND name='scheduled_scans'"
    )
    if cursor.fetchone():
        return  # Already exists

    if verbose:
        print("   🔄 Migrating: adding scheduled_scans table...")

    conn.execute("""
        CREATE TABLE scheduled_scans (
            id INTEGER PRIMARY KEY AUTOINCREMENT, ...
        )
    """)

    # Add scheduled_scan_id to existing scans table
    try:
        conn.execute("ALTER TABLE scans ADD COLUMN scheduled_scan_id INTEGER REFERENCES scheduled_scans(id) ON DELETE SET NULL")
    except sqlite3.OperationalError:
        pass  # Column already exists

    conn.execute("CREATE INDEX IF NOT EXISTS idx_scans_scheduled_scan_id ON scans(scheduled_scan_id) WHERE scheduled_scan_id IS NOT NULL")
    conn.commit()

    if verbose:
        print("   ✅ Migration complete: scheduled_scans table created")

8. API Endpoints

All under /api/v1/schedules. Rate limit: 30 req/min per IP (same as scans list).

Method	Path	Description
`GET`	`/schedules`	List all schedules (paginated)
`POST`	`/schedules`	Create a schedule
`GET`	`/schedules/{id}`	Schedule details + last 5 scan IDs
`PATCH`	`/schedules/{id}`	Update (enable/disable, change frequency/params)
`DELETE`	`/schedules/{id}`	Delete schedule (scans are kept, FK set to NULL)
`POST`	`/schedules/{id}/run-now`	Trigger immediately (ignores next_run_at)

Request model: `CreateScheduleRequest`

class CreateScheduleRequest(BaseModel):
    origin: str                          # 3-char IATA
    country: Optional[str]               # 2-letter ISO country code
    destinations: Optional[List[str]]    # Alternative: list of IATA codes
    window_months: int = 1               # Weeks of data per scan run
    seat_class: str = 'economy'
    adults: int = 1
    label: Optional[str]                 # Human-readable name
    frequency: str                       # 'daily' | 'weekly' | 'monthly'
    hour: int = 6                        # UTC hour (0–23)
    minute: int = 0                      # UTC minute (0–59)
    day_of_week: Optional[int]           # Required when frequency='weekly' (0=Mon)
    day_of_month: Optional[int]          # Required when frequency='monthly' (1–28)

Response model: `Schedule`

class Schedule(BaseModel):
    id: int
    origin: str
    country: str
    window_months: int
    seat_class: str
    adults: int
    label: Optional[str]
    frequency: str
    hour: int
    minute: int
    day_of_week: Optional[int]
    day_of_month: Optional[int]
    enabled: bool
    last_run_at: Optional[str]
    next_run_at: str
    created_at: str
    recent_scan_ids: List[int]           # Last 5 scans created by this schedule

9. Scheduler Lifecycle (`api_server.py`)

9.1 Startup

In the existing lifespan() context manager, after existing startup code:

scheduler_task = asyncio.create_task(_scheduler_loop())
logger.info("Scheduled scan background task started")
yield
scheduler_task.cancel()
try:
    await scheduler_task
except asyncio.CancelledError:
    pass

9.2 Missed runs on restart

When the server starts, _check_and_run_due_schedules() fires immediately (before the 60-second sleep), catching any schedules that were due while the server was down. Each overdue schedule runs exactly once — next_run_at is then advanced to the next future interval. Multiple missed intervals are not caught up.

9.3 Concurrency guard

Before firing a scan for a schedule, check:

running = conn.execute("""
    SELECT id FROM scans
    WHERE scheduled_scan_id = ? AND status IN ('pending', 'running')
""", (schedule_id,)).fetchone()

if running:
    logger.info(f"Schedule {schedule_id}: previous scan {running[0]} still active, skipping this run")
    # Still advance next_run_at so we try again next interval
    continue

10. Frontend Changes

10.1 New page: `Schedules.tsx`

List view:

Table of all schedules: label, origin → country, frequency, next run (local time), last run, enabled toggle
"New Schedule" button opens create form (same airport search component as Scans)
Inline enable/disable toggle (PATCH request, optimistic update)
"Run now" button per row

Create form fields (below existing scan form fields):

Frequency selector: Daily / Weekly / Monthly (segmented button)
Time of day: hour:minute picker (UTC, with note)
Day of week (shown only for Weekly): Mon–Sun selector
Day of month (shown only for Monthly): 1–28 number input
Optional label field

10.2 Modified: `ScanDetails.tsx`

When a scan has scheduled_scan_id, show a small "Scheduled" chip in the header with a link to /schedules/{scheduled_scan_id}.

10.3 Navigation (`Layout.tsx`)

Add "Schedules" link to sidebar between Scans and Airports.

10.4 API client (`api.ts`)

export interface Schedule {
  id: number;
  origin: string;
  country: string;
  window_months: number;
  seat_class: string;
  adults: number;
  label?: string;
  frequency: 'daily' | 'weekly' | 'monthly';
  hour: number;
  minute: number;
  day_of_week?: number;
  day_of_month?: number;
  enabled: boolean;
  last_run_at?: string;
  next_run_at: string;
  created_at: string;
  recent_scan_ids: number[];
}

export const scheduleApi = {
  list: (page = 1, limit = 20) =>
    api.get<PaginatedResponse<Schedule>>('/schedules', { params: { page, limit } }),
  get: (id: number) =>
    api.get<Schedule>(`/schedules/${id}`),
  create: (data: CreateScheduleRequest) =>
    api.post<Schedule>('/schedules', data),
  update: (id: number, data: Partial<CreateScheduleRequest> & { enabled?: boolean }) =>
    api.patch<Schedule>(`/schedules/${id}`, data),
  delete: (id: number) =>
    api.delete(`/schedules/${id}`),
  runNow: (id: number) =>
    api.post<{ scan_id: number }>(`/schedules/${id}/run-now`),
};

11. Edge Cases

Case	Handling
Previous scan still running at next interval	Skip this interval's run, advance `next_run_at`, log warning
Server down when schedule is due	On startup, runs any overdue schedule once; does not catch up multiple missed intervals
Schedule deleted while scan is running	`ON DELETE SET NULL` on FK — scan continues, `scheduled_scan_id` becomes NULL
`window_months` covers past dates	Scan start date is always "tomorrow" at creation time, same as manual scans
Monthly with day_of_month=29..31	Capped at 28 in validation — avoids invalid dates in all months
Simultaneous due schedules	Each creates an independent asyncio task; existing `max_workers=3` semaphore in scan_processor limits total API concurrency across all running scans
Schedule created at 05:59, fires at 06:00 UTC	`next_run_at` is computed at creation time — if 06:00 today already passed, fires tomorrow

12. Files Changed

File	Change
`database/schema.sql`	Add `scheduled_scans` table, trigger, indexes, schema_version bump
`database/init_db.py`	`_migrate_add_scheduled_scans()` + call in `initialize_database()`
`api_server.py`	`compute_next_run()`, `_scheduler_loop()`, `_check_and_run_due_schedules()`, 6 new endpoints, lifespan update, new Pydantic models
`frontend/src/api.ts`	`Schedule` type, `CreateScheduleRequest` type, `scheduleApi` object
`frontend/src/pages/Schedules.tsx`	New page (list + inline create form)
`frontend/src/pages/ScanDetails.tsx`	"Scheduled" badge + link when `scheduled_scan_id` present
`frontend/src/components/Layout.tsx`	Schedules nav link

Total: 7 files. Estimated ~500 new lines (backend ~250, frontend ~250).

13. Out of Scope

Notifications / alerts when a scheduled scan completes (email, webhook)
Per-schedule price change detection / diffing between runs
Timezone-aware scheduling (all times UTC for now)
Pause/resume of scheduled scans (separate PRD)
Rate limiting across simultaneous scheduled scans (existing semaphore provides soft protection)
Dashboard widgets for upcoming scheduled runs

14 KiB Raw Permalink Blame History Unescape Escape