Full-stack flight price scanner built on fast-flights v3 (SOCS cookie bypass): Backend (FastAPI + SQLite): - REST API with rate limiting, Pydantic v2 validation, paginated responses - Scan pipeline: resolves airports, queries every day in the window, saves individual flights + aggregate route stats to SQLite - Background async scan processor with real-time progress tracking - Airport search endpoint backed by OpenFlights dataset - Daily scan window (all dates, not monthly samples) Frontend (React 19 + TypeScript + Tailwind CSS v4): - Dashboard with live scan status and recent scans - Create scan form: country mode or specific airports (searchable dropdown) - Scan detail page with expandable route rows showing individual flights (date, airline, departure, arrival, price) loaded on demand - AirportSearch component with debounced live search and multi-select Database: - scans → routes → flights schema with FK cascade and auto-update triggers - Migrations for schema evolution (relaxed country constraint) Tests: - 74 tests: unit + integration, isolated per-test SQLite DB - Confirmed flight fixtures in tests/confirmed_flights.json (50 real flights, BDS→FMM Ryanair + BDS→DUS Eurowings, scraped Feb 2026) - Integration tests parametrized from confirmed routes Docker: - Multi-stage builds, Compose orchestration, Nginx reverse proxy Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
317 lines
7.6 KiB
Markdown
317 lines
7.6 KiB
Markdown
# Flight Search Caching System
|
||
|
||
## Overview
|
||
|
||
The Flight Airport Comparator now includes a **SQLite-based caching system** to reduce API calls, prevent rate limiting, and provide instant results for repeated queries.
|
||
|
||
## How It Works
|
||
|
||
### Automatic Caching
|
||
- Every flight search is automatically saved to `data/flight_cache.db`
|
||
- Includes: origin, destination, date, seat class, adults, timestamp
|
||
- Stores all flight results: airline, price, times, duration, etc.
|
||
|
||
### Cache Lookup
|
||
Before making an API call, the tool:
|
||
1. Generates a unique cache key (SHA256 hash of query parameters)
|
||
2. Checks if results exist in database
|
||
3. Verifies results are within threshold (default: 24 hours)
|
||
4. Returns cached data if valid, otherwise queries API
|
||
|
||
### Cache Indicators
|
||
```
|
||
💾 Cache hit: BER->BRI on 2026-03-23 (1 flights) # Instant result (0.0s)
|
||
```
|
||
|
||
No indicator = Cache miss, fresh API query made (~2-3s per route)
|
||
|
||
## Usage
|
||
|
||
### CLI Options
|
||
|
||
**Use default cache (24 hours):**
|
||
```bash
|
||
python main.py --to JFK --country DE
|
||
```
|
||
|
||
**Custom cache threshold (48 hours):**
|
||
```bash
|
||
python main.py --to JFK --country DE --cache-threshold 48
|
||
```
|
||
|
||
**Disable cache (force fresh queries):**
|
||
```bash
|
||
python main.py --to JFK --country DE --no-cache
|
||
```
|
||
|
||
### Cache Management
|
||
|
||
**View statistics:**
|
||
```bash
|
||
python cache_admin.py stats
|
||
|
||
# Output:
|
||
# Flight Search Cache Statistics
|
||
# ==================================================
|
||
# Database location: /Users/.../flight_cache.db
|
||
# Total searches cached: 42
|
||
# Total flight results: 156
|
||
# Database size: 0.15 MB
|
||
# Oldest entry: 2026-02-20 10:30:00
|
||
# Newest entry: 2026-02-21 18:55:50
|
||
```
|
||
|
||
**Clean old entries:**
|
||
```bash
|
||
# Delete entries older than 30 days
|
||
python cache_admin.py clean --days 30
|
||
|
||
# Delete entries older than 7 days
|
||
python cache_admin.py clean --days 7 --confirm
|
||
```
|
||
|
||
**Clear entire cache:**
|
||
```bash
|
||
python cache_admin.py clear-all
|
||
# ⚠️ WARNING: Requires confirmation
|
||
```
|
||
|
||
## Database Schema
|
||
|
||
### flight_searches table
|
||
```sql
|
||
CREATE TABLE flight_searches (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
query_hash TEXT NOT NULL UNIQUE, -- SHA256 of query params
|
||
origin TEXT NOT NULL,
|
||
destination TEXT NOT NULL,
|
||
search_date TEXT NOT NULL, -- YYYY-MM-DD
|
||
seat_class TEXT NOT NULL,
|
||
adults INTEGER NOT NULL,
|
||
query_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
|
||
);
|
||
```
|
||
|
||
### flight_results table
|
||
```sql
|
||
CREATE TABLE flight_results (
|
||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||
search_id INTEGER NOT NULL, -- FK to flight_searches
|
||
airline TEXT,
|
||
departure_time TEXT,
|
||
arrival_time TEXT,
|
||
duration_minutes INTEGER,
|
||
price REAL,
|
||
currency TEXT,
|
||
plane_type TEXT,
|
||
FOREIGN KEY (search_id) REFERENCES flight_searches(id) ON DELETE CASCADE
|
||
);
|
||
```
|
||
|
||
### Indexes
|
||
- `idx_query_hash` on `flight_searches(query_hash)` - Fast cache lookup
|
||
- `idx_query_timestamp` on `flight_searches(query_timestamp)` - Fast expiry checks
|
||
- `idx_search_id` on `flight_results(search_id)` - Fast result retrieval
|
||
|
||
## Benefits
|
||
|
||
### ⚡ Speed
|
||
- **Cache hit**: 0.0s (instant)
|
||
- **Cache miss**: ~2-3s (API call + save to cache)
|
||
- Example: 95 airports × 3 dates = 285 queries
|
||
- First run: ~226s (fresh API calls)
|
||
- Second run: ~0.1s (all cache hits!)
|
||
|
||
### 🛡️ Rate Limit Protection
|
||
- Prevents identical repeated queries
|
||
- Especially useful for:
|
||
- Testing and development
|
||
- Re-running seasonal scans
|
||
- Comparing different output formats
|
||
- Experimenting with sort orders
|
||
|
||
### 💰 Reduced API Load
|
||
- Fewer requests to Google Flights
|
||
- Lower risk of being rate-limited or blocked
|
||
- Respectful of Google's infrastructure
|
||
|
||
### 📊 Historical Data
|
||
- Cache preserves price snapshots over time
|
||
- Can compare prices from different query times
|
||
- Useful for tracking price trends
|
||
|
||
## Performance Example
|
||
|
||
**First Query (Cache Miss):**
|
||
```bash
|
||
$ python main.py --to BDS --country DE --window 3
|
||
# Searching 285 routes (95 airports × 3 dates)...
|
||
# Done in 226.2s
|
||
```
|
||
|
||
**Second Query (Cache Hit):**
|
||
```bash
|
||
$ python main.py --to BDS --country DE --window 3
|
||
# 💾 Cache hit: FMM->BDS on 2026-04-15 (1 flights)
|
||
# Done in 0.0s
|
||
```
|
||
|
||
**Savings:** 226.2s → 0.0s (100% cache hit rate)
|
||
|
||
## Cache Key Generation
|
||
|
||
Cache keys are SHA256 hashes of query parameters:
|
||
|
||
```python
|
||
# Example query
|
||
origin = "BER"
|
||
destination = "BRI"
|
||
date = "2026-03-23"
|
||
seat_class = "economy"
|
||
adults = 1
|
||
|
||
# Cache key
|
||
query_string = "BER|BRI|2026-03-23|economy|1"
|
||
cache_key = sha256(query_string) = "a7f3c8d2..."
|
||
```
|
||
|
||
Different parameters = different cache key:
|
||
- `BER->BRI, 2026-03-23, economy, 1` ≠ `BER->BRI, 2026-03-24, economy, 1`
|
||
- `BER->BRI, 2026-03-23, economy, 1` ≠ `BER->BRI, 2026-03-23, business, 1`
|
||
|
||
## Maintenance
|
||
|
||
### Recommended Cache Cleaning Schedule
|
||
|
||
**For regular users:**
|
||
```bash
|
||
# Clean monthly (keep last 30 days)
|
||
python cache_admin.py clean --days 30 --confirm
|
||
```
|
||
|
||
**For developers/testers:**
|
||
```bash
|
||
# Clean weekly (keep last 7 days)
|
||
python cache_admin.py clean --days 7 --confirm
|
||
```
|
||
|
||
**For one-time users:**
|
||
```bash
|
||
# Clear all after use
|
||
python cache_admin.py clear-all --confirm
|
||
```
|
||
|
||
### Database Growth
|
||
|
||
**Typical sizes:**
|
||
- 1 search = ~1 KB
|
||
- 100 searches = ~100 KB
|
||
- 1000 searches = ~1 MB
|
||
- 10,000 searches = ~10 MB
|
||
|
||
Most users will stay under 1 MB even with heavy use.
|
||
|
||
## Testing
|
||
|
||
**Test cache functionality:**
|
||
```bash
|
||
python test_cache.py
|
||
|
||
# Output:
|
||
# ======================================================================
|
||
# TESTING CACHE OPERATIONS
|
||
# ======================================================================
|
||
#
|
||
# 1. Clearing old cache...
|
||
# ✓ Cache cleared
|
||
# 2. Testing cache miss (first query)...
|
||
# ✓ Cache miss (as expected)
|
||
# 3. Saving flight results to cache...
|
||
# ✓ Results saved
|
||
# 4. Testing cache hit (second query)...
|
||
# ✓ Cache hit: Found 1 flight(s)
|
||
# ...
|
||
# ✅ ALL CACHE TESTS PASSED!
|
||
```
|
||
|
||
## Architecture
|
||
|
||
### Integration Points
|
||
|
||
1. **searcher_v3.py**:
|
||
- `search_direct_flights()` checks cache before API call
|
||
- Saves results after successful query
|
||
|
||
2. **main.py**:
|
||
- `--cache-threshold` CLI option
|
||
- `--no-cache` flag
|
||
- Passes cache settings to searcher
|
||
|
||
3. **cache.py**:
|
||
- `get_cached_results()`: Check for valid cached data
|
||
- `save_results()`: Store flight results
|
||
- `clear_old_cache()`: Maintenance operations
|
||
- `get_cache_stats()`: Database statistics
|
||
|
||
4. **cache_admin.py**:
|
||
- CLI for cache management
|
||
- Human-readable statistics
|
||
- Safe deletion with confirmations
|
||
|
||
## Implementation Details
|
||
|
||
### Thread Safety
|
||
SQLite handles concurrent reads automatically. Writes are serialized by SQLite's locking mechanism.
|
||
|
||
### Error Handling
|
||
- Database errors are caught and logged
|
||
- Failed cache operations fall through to API queries
|
||
- No crash on corrupted database (graceful degradation)
|
||
|
||
### Data Persistence
|
||
- Cache survives program restarts
|
||
- Located in `data/flight_cache.db`
|
||
- Can be backed up, copied, or shared
|
||
|
||
## Future Enhancements
|
||
|
||
Potential improvements:
|
||
- [ ] Cache invalidation based on flight departure time
|
||
- [ ] Compression for large result sets
|
||
- [ ] Export cache to CSV for analysis
|
||
- [ ] Cache warming (pre-populate common routes)
|
||
- [ ] Distributed cache (Redis/Memcached)
|
||
- [ ] Cache analytics (hit rate, popular routes)
|
||
|
||
## Troubleshooting
|
||
|
||
**Cache not working:**
|
||
```bash
|
||
# Check if cache module is available
|
||
python -c "import cache; print('✓ Cache available')"
|
||
|
||
# Initialize database manually
|
||
python cache_admin.py init
|
||
```
|
||
|
||
**Database locked:**
|
||
```bash
|
||
# Close all running instances
|
||
# Or delete and reinitialize
|
||
rm data/flight_cache.db
|
||
python cache_admin.py init
|
||
```
|
||
|
||
**Disk space issues:**
|
||
```bash
|
||
# Check database size
|
||
python cache_admin.py stats
|
||
|
||
# Clean aggressively
|
||
python cache_admin.py clean --days 1 --confirm
|
||
```
|
||
|
||
## Credits
|
||
|
||
Caching implementation by Claude Code, integrated with fast-flights v3.0rc1 SOCS cookie bypass.
|