Files
ciaovolo/flight-comparator/PRD_SCHEDULED_SCANS.md
domverse 836c8474eb feat: add scheduled scans (cron-like recurring scans)
- New `scheduled_scans` table with daily/weekly/monthly frequencies
- asyncio background scheduler loop checks for due schedules every 60s
- 6 REST endpoints: CRUD + toggle enabled + run-now
- `scheduled_scan_id` FK added to scans table; migrated automatically
- Frontend: Schedules page (list + create form), Schedules nav link,
  "Scheduled" badge on ScanDetails when scan was triggered by a schedule

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 10:48:43 +01:00

410 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# PRD: Scheduled Scans
**Status:** Draft
**Date:** 2026-02-27
**Verdict:** Fully feasible — no new dependencies required
---
## 1. Problem
Every scan is triggered manually. If you want to track prices for a route over time (e.g. BDS → Germany every Monday) you have to remember to click "Re-run" yourself. Price trends are only discoverable by comparing scan history manually.
---
## 2. Goal
Let users define a recurring schedule for any scan configuration. The server runs the scan automatically at the defined cadence, building a historical record of price data over time.
---
## 3. User Stories
- **As a user**, I want to schedule a weekly scan of BDS → Germany so I can see how prices change without manually re-running it.
- **As a user**, I want to enable/disable a schedule without deleting it.
- **As a user**, I want to see which scans were created by a schedule and navigate to that schedule from a scan.
- **As a user**, I want to trigger a scheduled scan immediately without waiting for the next interval.
---
## 4. Scheduling Options
Three frequencies are sufficient for flight price tracking:
| Frequency | Parameters | Example |
|-----------|-----------|---------|
| `daily` | hour, minute | Every day at 06:00 |
| `weekly` | day_of_week (0=Mon6=Sun), hour, minute | Every Monday at 06:00 |
| `monthly` | day_of_month (128), hour, minute | 1st of every month at 06:00 |
Day of month capped at 28 to avoid Feb 29/30/31 edge cases. All times stored and executed in UTC.
---
## 5. Architecture
### 5.1 Scheduler Design
No new dependencies. A simple asyncio background task wakes every 60 seconds, queries the DB for due schedules, and fires a scan for each.
```
lifespan startup
└── asyncio.create_task(_scheduler_loop())
└── while True:
_check_and_run_due_schedules() # queries DB
await asyncio.sleep(60)
```
`_check_and_run_due_schedules()`:
1. `SELECT * FROM scheduled_scans WHERE enabled=1 AND next_run_at <= NOW()`
2. For each result, skip if previous scan for this schedule is still `pending` or `running`
3. Create a new scan row (same INSERT as `POST /scans`)
4. Call `start_scan_processor(scan_id)`
5. Update `last_run_at = NOW()` and compute + store `next_run_at`
### 5.2 `next_run_at` Computation
Precomputed in Python after every run (and on create/update). Stored as a TIMESTAMP column with an index — scheduler lookup is a single indexed range query.
```python
def compute_next_run(frequency, hour, minute,
day_of_week=None, day_of_month=None,
after=None) -> datetime:
now = after or datetime.utcnow()
base = now.replace(hour=hour, minute=minute, second=0, microsecond=0)
if frequency == 'daily':
return base if base > now else base + timedelta(days=1)
elif frequency == 'weekly':
days_ahead = (day_of_week - now.weekday()) % 7
if days_ahead == 0 and base <= now:
days_ahead = 7
return (now + timedelta(days=days_ahead)).replace(
hour=hour, minute=minute, second=0, microsecond=0)
elif frequency == 'monthly':
candidate = now.replace(day=day_of_month, hour=hour, minute=minute, second=0, microsecond=0)
if candidate <= now:
m, y = (now.month % 12) + 1, now.year + (1 if now.month == 12 else 0)
candidate = candidate.replace(year=y, month=m)
return candidate
```
---
## 6. Schema Changes
### 6.1 New table: `scheduled_scans`
```sql
CREATE TABLE IF NOT EXISTS scheduled_scans (
id INTEGER PRIMARY KEY AUTOINCREMENT,
-- Scan parameters
origin TEXT NOT NULL CHECK(length(origin) = 3),
country TEXT NOT NULL CHECK(length(country) >= 2),
window_months INTEGER NOT NULL DEFAULT 1
CHECK(window_months >= 1 AND window_months <= 12),
seat_class TEXT NOT NULL DEFAULT 'economy',
adults INTEGER NOT NULL DEFAULT 1
CHECK(adults > 0 AND adults <= 9),
-- Schedule definition
frequency TEXT NOT NULL
CHECK(frequency IN ('daily', 'weekly', 'monthly')),
hour INTEGER NOT NULL DEFAULT 6
CHECK(hour >= 0 AND hour <= 23),
minute INTEGER NOT NULL DEFAULT 0
CHECK(minute >= 0 AND minute <= 59),
day_of_week INTEGER CHECK(day_of_week >= 0 AND day_of_week <= 6),
day_of_month INTEGER CHECK(day_of_month >= 1 AND day_of_month <= 28),
-- State
enabled INTEGER NOT NULL DEFAULT 1,
label TEXT,
last_run_at TIMESTAMP,
next_run_at TIMESTAMP NOT NULL,
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
-- Frequency-specific constraints
CHECK(
(frequency = 'weekly' AND day_of_week IS NOT NULL) OR
(frequency = 'monthly' AND day_of_month IS NOT NULL) OR
(frequency = 'daily')
)
);
-- Fast lookup of due schedules
CREATE UNIQUE INDEX IF NOT EXISTS uq_scheduled_scans_id
ON scheduled_scans(id);
CREATE INDEX IF NOT EXISTS idx_scheduled_scans_next_run
ON scheduled_scans(next_run_at)
WHERE enabled = 1;
-- Auto-update updated_at
CREATE TRIGGER IF NOT EXISTS update_scheduled_scans_timestamp
AFTER UPDATE ON scheduled_scans
FOR EACH ROW BEGIN
UPDATE scheduled_scans SET updated_at = CURRENT_TIMESTAMP WHERE id = NEW.id;
END;
-- Insert schema version bump
INSERT OR IGNORE INTO schema_version (version, description)
VALUES (2, 'Add scheduled_scans table');
```
### 6.2 Add FK column to `scans`
```sql
-- Migration: add scheduled_scan_id to scans
ALTER TABLE scans ADD COLUMN scheduled_scan_id INTEGER
REFERENCES scheduled_scans(id) ON DELETE SET NULL;
CREATE INDEX IF NOT EXISTS idx_scans_scheduled_scan_id
ON scans(scheduled_scan_id)
WHERE scheduled_scan_id IS NOT NULL;
```
---
## 7. Migration (`database/init_db.py`)
Add two migration functions, called before `executescript(schema_sql)`:
```python
def _migrate_add_scheduled_scans(conn, verbose=True):
"""Migration: create scheduled_scans table and add FK to scans."""
cursor = conn.execute(
"SELECT name FROM sqlite_master WHERE type='table' AND name='scheduled_scans'"
)
if cursor.fetchone():
return # Already exists
if verbose:
print(" 🔄 Migrating: adding scheduled_scans table...")
conn.execute("""
CREATE TABLE scheduled_scans (
id INTEGER PRIMARY KEY AUTOINCREMENT, ...
)
""")
# Add scheduled_scan_id to existing scans table
try:
conn.execute("ALTER TABLE scans ADD COLUMN scheduled_scan_id INTEGER REFERENCES scheduled_scans(id) ON DELETE SET NULL")
except sqlite3.OperationalError:
pass # Column already exists
conn.execute("CREATE INDEX IF NOT EXISTS idx_scans_scheduled_scan_id ON scans(scheduled_scan_id) WHERE scheduled_scan_id IS NOT NULL")
conn.commit()
if verbose:
print(" ✅ Migration complete: scheduled_scans table created")
```
---
## 8. API Endpoints
All under `/api/v1/schedules`. Rate limit: 30 req/min per IP (same as scans list).
| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/schedules` | List all schedules (paginated) |
| `POST` | `/schedules` | Create a schedule |
| `GET` | `/schedules/{id}` | Schedule details + last 5 scan IDs |
| `PATCH` | `/schedules/{id}` | Update (enable/disable, change frequency/params) |
| `DELETE` | `/schedules/{id}` | Delete schedule (scans are kept, FK set to NULL) |
| `POST` | `/schedules/{id}/run-now` | Trigger immediately (ignores next_run_at) |
### Request model: `CreateScheduleRequest`
```python
class CreateScheduleRequest(BaseModel):
origin: str # 3-char IATA
country: Optional[str] # 2-letter ISO country code
destinations: Optional[List[str]] # Alternative: list of IATA codes
window_months: int = 1 # Weeks of data per scan run
seat_class: str = 'economy'
adults: int = 1
label: Optional[str] # Human-readable name
frequency: str # 'daily' | 'weekly' | 'monthly'
hour: int = 6 # UTC hour (023)
minute: int = 0 # UTC minute (059)
day_of_week: Optional[int] # Required when frequency='weekly' (0=Mon)
day_of_month: Optional[int] # Required when frequency='monthly' (128)
```
### Response model: `Schedule`
```python
class Schedule(BaseModel):
id: int
origin: str
country: str
window_months: int
seat_class: str
adults: int
label: Optional[str]
frequency: str
hour: int
minute: int
day_of_week: Optional[int]
day_of_month: Optional[int]
enabled: bool
last_run_at: Optional[str]
next_run_at: str
created_at: str
recent_scan_ids: List[int] # Last 5 scans created by this schedule
```
---
## 9. Scheduler Lifecycle (`api_server.py`)
### 9.1 Startup
In the existing `lifespan()` context manager, after existing startup code:
```python
scheduler_task = asyncio.create_task(_scheduler_loop())
logger.info("Scheduled scan background task started")
yield
scheduler_task.cancel()
try:
await scheduler_task
except asyncio.CancelledError:
pass
```
### 9.2 Missed runs on restart
When the server starts, `_check_and_run_due_schedules()` fires immediately (before the 60-second sleep), catching any schedules that were due while the server was down. Each overdue schedule runs exactly once — `next_run_at` is then advanced to the next future interval. Multiple missed intervals are not caught up.
### 9.3 Concurrency guard
Before firing a scan for a schedule, check:
```python
running = conn.execute("""
SELECT id FROM scans
WHERE scheduled_scan_id = ? AND status IN ('pending', 'running')
""", (schedule_id,)).fetchone()
if running:
logger.info(f"Schedule {schedule_id}: previous scan {running[0]} still active, skipping this run")
# Still advance next_run_at so we try again next interval
continue
```
---
## 10. Frontend Changes
### 10.1 New page: `Schedules.tsx`
**List view:**
- Table of all schedules: label, origin → country, frequency, next run (local time), last run, enabled toggle
- "New Schedule" button opens create form (same airport search component as Scans)
- Inline enable/disable toggle (PATCH request, optimistic update)
- "Run now" button per row
**Create form fields (below existing scan form fields):**
- Frequency selector: Daily / Weekly / Monthly (segmented button)
- Time of day: hour:minute picker (UTC, with note)
- Day of week (shown only for Weekly): MonSun selector
- Day of month (shown only for Monthly): 128 number input
- Optional label field
### 10.2 Modified: `ScanDetails.tsx`
When a scan has `scheduled_scan_id`, show a small "Scheduled" chip in the header with a link to `/schedules/{scheduled_scan_id}`.
### 10.3 Navigation (`Layout.tsx`)
Add "Schedules" link to sidebar between Scans and Airports.
### 10.4 API client (`api.ts`)
```typescript
export interface Schedule {
id: number;
origin: string;
country: string;
window_months: number;
seat_class: string;
adults: number;
label?: string;
frequency: 'daily' | 'weekly' | 'monthly';
hour: number;
minute: number;
day_of_week?: number;
day_of_month?: number;
enabled: boolean;
last_run_at?: string;
next_run_at: string;
created_at: string;
recent_scan_ids: number[];
}
export const scheduleApi = {
list: (page = 1, limit = 20) =>
api.get<PaginatedResponse<Schedule>>('/schedules', { params: { page, limit } }),
get: (id: number) =>
api.get<Schedule>(`/schedules/${id}`),
create: (data: CreateScheduleRequest) =>
api.post<Schedule>('/schedules', data),
update: (id: number, data: Partial<CreateScheduleRequest> & { enabled?: boolean }) =>
api.patch<Schedule>(`/schedules/${id}`, data),
delete: (id: number) =>
api.delete(`/schedules/${id}`),
runNow: (id: number) =>
api.post<{ scan_id: number }>(`/schedules/${id}/run-now`),
};
```
---
## 11. Edge Cases
| Case | Handling |
|------|----------|
| Previous scan still running at next interval | Skip this interval's run, advance `next_run_at`, log warning |
| Server down when schedule is due | On startup, runs any overdue schedule once; does not catch up multiple missed intervals |
| Schedule deleted while scan is running | `ON DELETE SET NULL` on FK — scan continues, `scheduled_scan_id` becomes NULL |
| `window_months` covers past dates | Scan start date is always "tomorrow" at creation time, same as manual scans |
| Monthly with day_of_month=29..31 | Capped at 28 in validation — avoids invalid dates in all months |
| Simultaneous due schedules | Each creates an independent asyncio task; existing `max_workers=3` semaphore in scan_processor limits total API concurrency across all running scans |
| Schedule created at 05:59, fires at 06:00 UTC | `next_run_at` is computed at creation time — if 06:00 today already passed, fires tomorrow |
---
## 12. Files Changed
| File | Change |
|------|--------|
| `database/schema.sql` | Add `scheduled_scans` table, trigger, indexes, schema_version bump |
| `database/init_db.py` | `_migrate_add_scheduled_scans()` + call in `initialize_database()` |
| `api_server.py` | `compute_next_run()`, `_scheduler_loop()`, `_check_and_run_due_schedules()`, 6 new endpoints, lifespan update, new Pydantic models |
| `frontend/src/api.ts` | `Schedule` type, `CreateScheduleRequest` type, `scheduleApi` object |
| `frontend/src/pages/Schedules.tsx` | New page (list + inline create form) |
| `frontend/src/pages/ScanDetails.tsx` | "Scheduled" badge + link when `scheduled_scan_id` present |
| `frontend/src/components/Layout.tsx` | Schedules nav link |
Total: 7 files. Estimated ~500 new lines (backend ~250, frontend ~250).
---
## 13. Out of Scope
- Notifications / alerts when a scheduled scan completes (email, webhook)
- Per-schedule price change detection / diffing between runs
- Timezone-aware scheduling (all times UTC for now)
- Pause/resume of scheduled scans (separate PRD)
- Rate limiting across simultaneous scheduled scans (existing semaphore provides soft protection)
- Dashboard widgets for upcoming scheduled runs