Files
ciaovolo/flight-comparator/docs/CACHING.md
domverse 6421f83ca7 Add flight comparator web app with full scan pipeline
Full-stack flight price scanner built on fast-flights v3 (SOCS cookie bypass):

Backend (FastAPI + SQLite):
- REST API with rate limiting, Pydantic v2 validation, paginated responses
- Scan pipeline: resolves airports, queries every day in the window, saves
  individual flights + aggregate route stats to SQLite
- Background async scan processor with real-time progress tracking
- Airport search endpoint backed by OpenFlights dataset
- Daily scan window (all dates, not monthly samples)

Frontend (React 19 + TypeScript + Tailwind CSS v4):
- Dashboard with live scan status and recent scans
- Create scan form: country mode or specific airports (searchable dropdown)
- Scan detail page with expandable route rows showing individual flights
  (date, airline, departure, arrival, price) loaded on demand
- AirportSearch component with debounced live search and multi-select

Database:
- scans → routes → flights schema with FK cascade and auto-update triggers
- Migrations for schema evolution (relaxed country constraint)

Tests:
- 74 tests: unit + integration, isolated per-test SQLite DB
- Confirmed flight fixtures in tests/confirmed_flights.json (50 real flights,
  BDS→FMM Ryanair + BDS→DUS Eurowings, scraped Feb 2026)
- Integration tests parametrized from confirmed routes

Docker:
- Multi-stage builds, Compose orchestration, Nginx reverse proxy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 17:11:51 +01:00

7.6 KiB
Raw Blame History

Flight Search Caching System

Overview

The Flight Airport Comparator now includes a SQLite-based caching system to reduce API calls, prevent rate limiting, and provide instant results for repeated queries.

How It Works

Automatic Caching

  • Every flight search is automatically saved to data/flight_cache.db
  • Includes: origin, destination, date, seat class, adults, timestamp
  • Stores all flight results: airline, price, times, duration, etc.

Cache Lookup

Before making an API call, the tool:

  1. Generates a unique cache key (SHA256 hash of query parameters)
  2. Checks if results exist in database
  3. Verifies results are within threshold (default: 24 hours)
  4. Returns cached data if valid, otherwise queries API

Cache Indicators

💾 Cache hit: BER->BRI on 2026-03-23 (1 flights)  # Instant result (0.0s)

No indicator = Cache miss, fresh API query made (~2-3s per route)

Usage

CLI Options

Use default cache (24 hours):

python main.py --to JFK --country DE

Custom cache threshold (48 hours):

python main.py --to JFK --country DE --cache-threshold 48

Disable cache (force fresh queries):

python main.py --to JFK --country DE --no-cache

Cache Management

View statistics:

python cache_admin.py stats

# Output:
# Flight Search Cache Statistics
# ==================================================
# Database location: /Users/.../flight_cache.db
# Total searches cached: 42
# Total flight results: 156
# Database size: 0.15 MB
# Oldest entry: 2026-02-20 10:30:00
# Newest entry: 2026-02-21 18:55:50

Clean old entries:

# Delete entries older than 30 days
python cache_admin.py clean --days 30

# Delete entries older than 7 days
python cache_admin.py clean --days 7 --confirm

Clear entire cache:

python cache_admin.py clear-all
# ⚠️ WARNING: Requires confirmation

Database Schema

flight_searches table

CREATE TABLE flight_searches (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    query_hash TEXT NOT NULL UNIQUE,        -- SHA256 of query params
    origin TEXT NOT NULL,
    destination TEXT NOT NULL,
    search_date TEXT NOT NULL,              -- YYYY-MM-DD
    seat_class TEXT NOT NULL,
    adults INTEGER NOT NULL,
    query_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);

flight_results table

CREATE TABLE flight_results (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    search_id INTEGER NOT NULL,             -- FK to flight_searches
    airline TEXT,
    departure_time TEXT,
    arrival_time TEXT,
    duration_minutes INTEGER,
    price REAL,
    currency TEXT,
    plane_type TEXT,
    FOREIGN KEY (search_id) REFERENCES flight_searches(id) ON DELETE CASCADE
);

Indexes

  • idx_query_hash on flight_searches(query_hash) - Fast cache lookup
  • idx_query_timestamp on flight_searches(query_timestamp) - Fast expiry checks
  • idx_search_id on flight_results(search_id) - Fast result retrieval

Benefits

Speed

  • Cache hit: 0.0s (instant)
  • Cache miss: ~2-3s (API call + save to cache)
  • Example: 95 airports × 3 dates = 285 queries
    • First run: ~226s (fresh API calls)
    • Second run: ~0.1s (all cache hits!)

🛡️ Rate Limit Protection

  • Prevents identical repeated queries
  • Especially useful for:
    • Testing and development
    • Re-running seasonal scans
    • Comparing different output formats
    • Experimenting with sort orders

💰 Reduced API Load

  • Fewer requests to Google Flights
  • Lower risk of being rate-limited or blocked
  • Respectful of Google's infrastructure

📊 Historical Data

  • Cache preserves price snapshots over time
  • Can compare prices from different query times
  • Useful for tracking price trends

Performance Example

First Query (Cache Miss):

$ python main.py --to BDS --country DE --window 3
# Searching 285 routes (95 airports × 3 dates)...
# Done in 226.2s

Second Query (Cache Hit):

$ python main.py --to BDS --country DE --window 3
# 💾 Cache hit: FMM->BDS on 2026-04-15 (1 flights)
# Done in 0.0s

Savings: 226.2s → 0.0s (100% cache hit rate)

Cache Key Generation

Cache keys are SHA256 hashes of query parameters:

# Example query
origin = "BER"
destination = "BRI"
date = "2026-03-23"
seat_class = "economy"
adults = 1

# Cache key
query_string = "BER|BRI|2026-03-23|economy|1"
cache_key = sha256(query_string) = "a7f3c8d2..."

Different parameters = different cache key:

  • BER->BRI, 2026-03-23, economy, 1BER->BRI, 2026-03-24, economy, 1
  • BER->BRI, 2026-03-23, economy, 1BER->BRI, 2026-03-23, business, 1

Maintenance

For regular users:

# Clean monthly (keep last 30 days)
python cache_admin.py clean --days 30 --confirm

For developers/testers:

# Clean weekly (keep last 7 days)
python cache_admin.py clean --days 7 --confirm

For one-time users:

# Clear all after use
python cache_admin.py clear-all --confirm

Database Growth

Typical sizes:

  • 1 search = ~1 KB
  • 100 searches = ~100 KB
  • 1000 searches = ~1 MB
  • 10,000 searches = ~10 MB

Most users will stay under 1 MB even with heavy use.

Testing

Test cache functionality:

python test_cache.py

# Output:
# ======================================================================
# TESTING CACHE OPERATIONS
# ======================================================================
#
# 1. Clearing old cache...
#    ✓ Cache cleared
# 2. Testing cache miss (first query)...
#    ✓ Cache miss (as expected)
# 3. Saving flight results to cache...
#    ✓ Results saved
# 4. Testing cache hit (second query)...
#    ✓ Cache hit: Found 1 flight(s)
# ...
# ✅ ALL CACHE TESTS PASSED!

Architecture

Integration Points

  1. searcher_v3.py:

    • search_direct_flights() checks cache before API call
    • Saves results after successful query
  2. main.py:

    • --cache-threshold CLI option
    • --no-cache flag
    • Passes cache settings to searcher
  3. cache.py:

    • get_cached_results(): Check for valid cached data
    • save_results(): Store flight results
    • clear_old_cache(): Maintenance operations
    • get_cache_stats(): Database statistics
  4. cache_admin.py:

    • CLI for cache management
    • Human-readable statistics
    • Safe deletion with confirmations

Implementation Details

Thread Safety

SQLite handles concurrent reads automatically. Writes are serialized by SQLite's locking mechanism.

Error Handling

  • Database errors are caught and logged
  • Failed cache operations fall through to API queries
  • No crash on corrupted database (graceful degradation)

Data Persistence

  • Cache survives program restarts
  • Located in data/flight_cache.db
  • Can be backed up, copied, or shared

Future Enhancements

Potential improvements:

  • Cache invalidation based on flight departure time
  • Compression for large result sets
  • Export cache to CSV for analysis
  • Cache warming (pre-populate common routes)
  • Distributed cache (Redis/Memcached)
  • Cache analytics (hit rate, popular routes)

Troubleshooting

Cache not working:

# Check if cache module is available
python -c "import cache; print('✓ Cache available')"

# Initialize database manually
python cache_admin.py init

Database locked:

# Close all running instances
# Or delete and reinitialize
rm data/flight_cache.db
python cache_admin.py init

Disk space issues:

# Check database size
python cache_admin.py stats

# Clean aggressively
python cache_admin.py clean --days 1 --confirm

Credits

Caching implementation by Claude Code, integrated with fast-flights v3.0rc1 SOCS cookie bypass.