Full-stack flight price scanner built on fast-flights v3 (SOCS cookie bypass): Backend (FastAPI + SQLite): - REST API with rate limiting, Pydantic v2 validation, paginated responses - Scan pipeline: resolves airports, queries every day in the window, saves individual flights + aggregate route stats to SQLite - Background async scan processor with real-time progress tracking - Airport search endpoint backed by OpenFlights dataset - Daily scan window (all dates, not monthly samples) Frontend (React 19 + TypeScript + Tailwind CSS v4): - Dashboard with live scan status and recent scans - Create scan form: country mode or specific airports (searchable dropdown) - Scan detail page with expandable route rows showing individual flights (date, airline, departure, arrival, price) loaded on demand - AirportSearch component with debounced live search and multi-select Database: - scans → routes → flights schema with FK cascade and auto-update triggers - Migrations for schema evolution (relaxed country constraint) Tests: - 74 tests: unit + integration, isolated per-test SQLite DB - Confirmed flight fixtures in tests/confirmed_flights.json (50 real flights, BDS→FMM Ryanair + BDS→DUS Eurowings, scraped Feb 2026) - Integration tests parametrized from confirmed routes Docker: - Multi-stage builds, Compose orchestration, Nginx reverse proxy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
7.6 KiB
Flight Search Caching System
Overview
The Flight Airport Comparator now includes a SQLite-based caching system to reduce API calls, prevent rate limiting, and provide instant results for repeated queries.
How It Works
Automatic Caching
- Every flight search is automatically saved to
data/flight_cache.db - Includes: origin, destination, date, seat class, adults, timestamp
- Stores all flight results: airline, price, times, duration, etc.
Cache Lookup
Before making an API call, the tool:
- Generates a unique cache key (SHA256 hash of query parameters)
- Checks if results exist in database
- Verifies results are within threshold (default: 24 hours)
- Returns cached data if valid, otherwise queries API
Cache Indicators
💾 Cache hit: BER->BRI on 2026-03-23 (1 flights) # Instant result (0.0s)
No indicator = Cache miss, fresh API query made (~2-3s per route)
Usage
CLI Options
Use default cache (24 hours):
python main.py --to JFK --country DE
Custom cache threshold (48 hours):
python main.py --to JFK --country DE --cache-threshold 48
Disable cache (force fresh queries):
python main.py --to JFK --country DE --no-cache
Cache Management
View statistics:
python cache_admin.py stats
# Output:
# Flight Search Cache Statistics
# ==================================================
# Database location: /Users/.../flight_cache.db
# Total searches cached: 42
# Total flight results: 156
# Database size: 0.15 MB
# Oldest entry: 2026-02-20 10:30:00
# Newest entry: 2026-02-21 18:55:50
Clean old entries:
# Delete entries older than 30 days
python cache_admin.py clean --days 30
# Delete entries older than 7 days
python cache_admin.py clean --days 7 --confirm
Clear entire cache:
python cache_admin.py clear-all
# ⚠️ WARNING: Requires confirmation
Database Schema
flight_searches table
CREATE TABLE flight_searches (
id INTEGER PRIMARY KEY AUTOINCREMENT,
query_hash TEXT NOT NULL UNIQUE, -- SHA256 of query params
origin TEXT NOT NULL,
destination TEXT NOT NULL,
search_date TEXT NOT NULL, -- YYYY-MM-DD
seat_class TEXT NOT NULL,
adults INTEGER NOT NULL,
query_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);
flight_results table
CREATE TABLE flight_results (
id INTEGER PRIMARY KEY AUTOINCREMENT,
search_id INTEGER NOT NULL, -- FK to flight_searches
airline TEXT,
departure_time TEXT,
arrival_time TEXT,
duration_minutes INTEGER,
price REAL,
currency TEXT,
plane_type TEXT,
FOREIGN KEY (search_id) REFERENCES flight_searches(id) ON DELETE CASCADE
);
Indexes
idx_query_hashonflight_searches(query_hash)- Fast cache lookupidx_query_timestamponflight_searches(query_timestamp)- Fast expiry checksidx_search_idonflight_results(search_id)- Fast result retrieval
Benefits
⚡ Speed
- Cache hit: 0.0s (instant)
- Cache miss: ~2-3s (API call + save to cache)
- Example: 95 airports × 3 dates = 285 queries
- First run: ~226s (fresh API calls)
- Second run: ~0.1s (all cache hits!)
🛡️ Rate Limit Protection
- Prevents identical repeated queries
- Especially useful for:
- Testing and development
- Re-running seasonal scans
- Comparing different output formats
- Experimenting with sort orders
💰 Reduced API Load
- Fewer requests to Google Flights
- Lower risk of being rate-limited or blocked
- Respectful of Google's infrastructure
📊 Historical Data
- Cache preserves price snapshots over time
- Can compare prices from different query times
- Useful for tracking price trends
Performance Example
First Query (Cache Miss):
$ python main.py --to BDS --country DE --window 3
# Searching 285 routes (95 airports × 3 dates)...
# Done in 226.2s
Second Query (Cache Hit):
$ python main.py --to BDS --country DE --window 3
# 💾 Cache hit: FMM->BDS on 2026-04-15 (1 flights)
# Done in 0.0s
Savings: 226.2s → 0.0s (100% cache hit rate)
Cache Key Generation
Cache keys are SHA256 hashes of query parameters:
# Example query
origin = "BER"
destination = "BRI"
date = "2026-03-23"
seat_class = "economy"
adults = 1
# Cache key
query_string = "BER|BRI|2026-03-23|economy|1"
cache_key = sha256(query_string) = "a7f3c8d2..."
Different parameters = different cache key:
BER->BRI, 2026-03-23, economy, 1≠BER->BRI, 2026-03-24, economy, 1BER->BRI, 2026-03-23, economy, 1≠BER->BRI, 2026-03-23, business, 1
Maintenance
Recommended Cache Cleaning Schedule
For regular users:
# Clean monthly (keep last 30 days)
python cache_admin.py clean --days 30 --confirm
For developers/testers:
# Clean weekly (keep last 7 days)
python cache_admin.py clean --days 7 --confirm
For one-time users:
# Clear all after use
python cache_admin.py clear-all --confirm
Database Growth
Typical sizes:
- 1 search = ~1 KB
- 100 searches = ~100 KB
- 1000 searches = ~1 MB
- 10,000 searches = ~10 MB
Most users will stay under 1 MB even with heavy use.
Testing
Test cache functionality:
python test_cache.py
# Output:
# ======================================================================
# TESTING CACHE OPERATIONS
# ======================================================================
#
# 1. Clearing old cache...
# ✓ Cache cleared
# 2. Testing cache miss (first query)...
# ✓ Cache miss (as expected)
# 3. Saving flight results to cache...
# ✓ Results saved
# 4. Testing cache hit (second query)...
# ✓ Cache hit: Found 1 flight(s)
# ...
# ✅ ALL CACHE TESTS PASSED!
Architecture
Integration Points
-
searcher_v3.py:
search_direct_flights()checks cache before API call- Saves results after successful query
-
main.py:
--cache-thresholdCLI option--no-cacheflag- Passes cache settings to searcher
-
cache.py:
get_cached_results(): Check for valid cached datasave_results(): Store flight resultsclear_old_cache(): Maintenance operationsget_cache_stats(): Database statistics
-
cache_admin.py:
- CLI for cache management
- Human-readable statistics
- Safe deletion with confirmations
Implementation Details
Thread Safety
SQLite handles concurrent reads automatically. Writes are serialized by SQLite's locking mechanism.
Error Handling
- Database errors are caught and logged
- Failed cache operations fall through to API queries
- No crash on corrupted database (graceful degradation)
Data Persistence
- Cache survives program restarts
- Located in
data/flight_cache.db - Can be backed up, copied, or shared
Future Enhancements
Potential improvements:
- Cache invalidation based on flight departure time
- Compression for large result sets
- Export cache to CSV for analysis
- Cache warming (pre-populate common routes)
- Distributed cache (Redis/Memcached)
- Cache analytics (hit rate, popular routes)
Troubleshooting
Cache not working:
# Check if cache module is available
python -c "import cache; print('✓ Cache available')"
# Initialize database manually
python cache_admin.py init
Database locked:
# Close all running instances
# Or delete and reinitialize
rm data/flight_cache.db
python cache_admin.py init
Disk space issues:
# Check database size
python cache_admin.py stats
# Clean aggressively
python cache_admin.py clean --days 1 --confirm
Credits
Caching implementation by Claude Code, integrated with fast-flights v3.0rc1 SOCS cookie bypass.