# Flight Search Caching System ## Overview The Flight Airport Comparator now includes a **SQLite-based caching system** to reduce API calls, prevent rate limiting, and provide instant results for repeated queries. ## How It Works ### Automatic Caching - Every flight search is automatically saved to `data/flight_cache.db` - Includes: origin, destination, date, seat class, adults, timestamp - Stores all flight results: airline, price, times, duration, etc. ### Cache Lookup Before making an API call, the tool: 1. Generates a unique cache key (SHA256 hash of query parameters) 2. Checks if results exist in database 3. Verifies results are within threshold (default: 24 hours) 4. Returns cached data if valid, otherwise queries API ### Cache Indicators ``` 💾 Cache hit: BER->BRI on 2026-03-23 (1 flights) # Instant result (0.0s) ``` No indicator = Cache miss, fresh API query made (~2-3s per route) ## Usage ### CLI Options **Use default cache (24 hours):** ```bash python main.py --to JFK --country DE ``` **Custom cache threshold (48 hours):** ```bash python main.py --to JFK --country DE --cache-threshold 48 ``` **Disable cache (force fresh queries):** ```bash python main.py --to JFK --country DE --no-cache ``` ### Cache Management **View statistics:** ```bash python cache_admin.py stats # Output: # Flight Search Cache Statistics # ================================================== # Database location: /Users/.../flight_cache.db # Total searches cached: 42 # Total flight results: 156 # Database size: 0.15 MB # Oldest entry: 2026-02-20 10:30:00 # Newest entry: 2026-02-21 18:55:50 ``` **Clean old entries:** ```bash # Delete entries older than 30 days python cache_admin.py clean --days 30 # Delete entries older than 7 days python cache_admin.py clean --days 7 --confirm ``` **Clear entire cache:** ```bash python cache_admin.py clear-all # ⚠️ WARNING: Requires confirmation ``` ## Database Schema ### flight_searches table ```sql CREATE TABLE flight_searches ( id INTEGER PRIMARY KEY AUTOINCREMENT, query_hash TEXT NOT NULL UNIQUE, -- SHA256 of query params origin TEXT NOT NULL, destination TEXT NOT NULL, search_date TEXT NOT NULL, -- YYYY-MM-DD seat_class TEXT NOT NULL, adults INTEGER NOT NULL, query_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP ); ``` ### flight_results table ```sql CREATE TABLE flight_results ( id INTEGER PRIMARY KEY AUTOINCREMENT, search_id INTEGER NOT NULL, -- FK to flight_searches airline TEXT, departure_time TEXT, arrival_time TEXT, duration_minutes INTEGER, price REAL, currency TEXT, plane_type TEXT, FOREIGN KEY (search_id) REFERENCES flight_searches(id) ON DELETE CASCADE ); ``` ### Indexes - `idx_query_hash` on `flight_searches(query_hash)` - Fast cache lookup - `idx_query_timestamp` on `flight_searches(query_timestamp)` - Fast expiry checks - `idx_search_id` on `flight_results(search_id)` - Fast result retrieval ## Benefits ### ⚡ Speed - **Cache hit**: 0.0s (instant) - **Cache miss**: ~2-3s (API call + save to cache) - Example: 95 airports × 3 dates = 285 queries - First run: ~226s (fresh API calls) - Second run: ~0.1s (all cache hits!) ### 🛡️ Rate Limit Protection - Prevents identical repeated queries - Especially useful for: - Testing and development - Re-running seasonal scans - Comparing different output formats - Experimenting with sort orders ### 💰 Reduced API Load - Fewer requests to Google Flights - Lower risk of being rate-limited or blocked - Respectful of Google's infrastructure ### 📊 Historical Data - Cache preserves price snapshots over time - Can compare prices from different query times - Useful for tracking price trends ## Performance Example **First Query (Cache Miss):** ```bash $ python main.py --to BDS --country DE --window 3 # Searching 285 routes (95 airports × 3 dates)... # Done in 226.2s ``` **Second Query (Cache Hit):** ```bash $ python main.py --to BDS --country DE --window 3 # 💾 Cache hit: FMM->BDS on 2026-04-15 (1 flights) # Done in 0.0s ``` **Savings:** 226.2s → 0.0s (100% cache hit rate) ## Cache Key Generation Cache keys are SHA256 hashes of query parameters: ```python # Example query origin = "BER" destination = "BRI" date = "2026-03-23" seat_class = "economy" adults = 1 # Cache key query_string = "BER|BRI|2026-03-23|economy|1" cache_key = sha256(query_string) = "a7f3c8d2..." ``` Different parameters = different cache key: - `BER->BRI, 2026-03-23, economy, 1` ≠ `BER->BRI, 2026-03-24, economy, 1` - `BER->BRI, 2026-03-23, economy, 1` ≠ `BER->BRI, 2026-03-23, business, 1` ## Maintenance ### Recommended Cache Cleaning Schedule **For regular users:** ```bash # Clean monthly (keep last 30 days) python cache_admin.py clean --days 30 --confirm ``` **For developers/testers:** ```bash # Clean weekly (keep last 7 days) python cache_admin.py clean --days 7 --confirm ``` **For one-time users:** ```bash # Clear all after use python cache_admin.py clear-all --confirm ``` ### Database Growth **Typical sizes:** - 1 search = ~1 KB - 100 searches = ~100 KB - 1000 searches = ~1 MB - 10,000 searches = ~10 MB Most users will stay under 1 MB even with heavy use. ## Testing **Test cache functionality:** ```bash python test_cache.py # Output: # ====================================================================== # TESTING CACHE OPERATIONS # ====================================================================== # # 1. Clearing old cache... # ✓ Cache cleared # 2. Testing cache miss (first query)... # ✓ Cache miss (as expected) # 3. Saving flight results to cache... # ✓ Results saved # 4. Testing cache hit (second query)... # ✓ Cache hit: Found 1 flight(s) # ... # ✅ ALL CACHE TESTS PASSED! ``` ## Architecture ### Integration Points 1. **searcher_v3.py**: - `search_direct_flights()` checks cache before API call - Saves results after successful query 2. **main.py**: - `--cache-threshold` CLI option - `--no-cache` flag - Passes cache settings to searcher 3. **cache.py**: - `get_cached_results()`: Check for valid cached data - `save_results()`: Store flight results - `clear_old_cache()`: Maintenance operations - `get_cache_stats()`: Database statistics 4. **cache_admin.py**: - CLI for cache management - Human-readable statistics - Safe deletion with confirmations ## Implementation Details ### Thread Safety SQLite handles concurrent reads automatically. Writes are serialized by SQLite's locking mechanism. ### Error Handling - Database errors are caught and logged - Failed cache operations fall through to API queries - No crash on corrupted database (graceful degradation) ### Data Persistence - Cache survives program restarts - Located in `data/flight_cache.db` - Can be backed up, copied, or shared ## Future Enhancements Potential improvements: - [ ] Cache invalidation based on flight departure time - [ ] Compression for large result sets - [ ] Export cache to CSV for analysis - [ ] Cache warming (pre-populate common routes) - [ ] Distributed cache (Redis/Memcached) - [ ] Cache analytics (hit rate, popular routes) ## Troubleshooting **Cache not working:** ```bash # Check if cache module is available python -c "import cache; print('✓ Cache available')" # Initialize database manually python cache_admin.py init ``` **Database locked:** ```bash # Close all running instances # Or delete and reinitialize rm data/flight_cache.db python cache_admin.py init ``` **Disk space issues:** ```bash # Check database size python cache_admin.py stats # Clean aggressively python cache_admin.py clean --days 1 --confirm ``` ## Credits Caching implementation by Claude Code, integrated with fast-flights v3.0rc1 SOCS cookie bypass.