Files
ciaovolo/flight-comparator/docs/CACHING.md
domverse 6421f83ca7 Add flight comparator web app with full scan pipeline
Full-stack flight price scanner built on fast-flights v3 (SOCS cookie bypass):

Backend (FastAPI + SQLite):
- REST API with rate limiting, Pydantic v2 validation, paginated responses
- Scan pipeline: resolves airports, queries every day in the window, saves
  individual flights + aggregate route stats to SQLite
- Background async scan processor with real-time progress tracking
- Airport search endpoint backed by OpenFlights dataset
- Daily scan window (all dates, not monthly samples)

Frontend (React 19 + TypeScript + Tailwind CSS v4):
- Dashboard with live scan status and recent scans
- Create scan form: country mode or specific airports (searchable dropdown)
- Scan detail page with expandable route rows showing individual flights
  (date, airline, departure, arrival, price) loaded on demand
- AirportSearch component with debounced live search and multi-select

Database:
- scans → routes → flights schema with FK cascade and auto-update triggers
- Migrations for schema evolution (relaxed country constraint)

Tests:
- 74 tests: unit + integration, isolated per-test SQLite DB
- Confirmed flight fixtures in tests/confirmed_flights.json (50 real flights,
  BDS→FMM Ryanair + BDS→DUS Eurowings, scraped Feb 2026)
- Integration tests parametrized from confirmed routes

Docker:
- Multi-stage builds, Compose orchestration, Nginx reverse proxy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 17:11:51 +01:00

317 lines
7.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Flight Search Caching System
## Overview
The Flight Airport Comparator now includes a **SQLite-based caching system** to reduce API calls, prevent rate limiting, and provide instant results for repeated queries.
## How It Works
### Automatic Caching
- Every flight search is automatically saved to `data/flight_cache.db`
- Includes: origin, destination, date, seat class, adults, timestamp
- Stores all flight results: airline, price, times, duration, etc.
### Cache Lookup
Before making an API call, the tool:
1. Generates a unique cache key (SHA256 hash of query parameters)
2. Checks if results exist in database
3. Verifies results are within threshold (default: 24 hours)
4. Returns cached data if valid, otherwise queries API
### Cache Indicators
```
💾 Cache hit: BER->BRI on 2026-03-23 (1 flights) # Instant result (0.0s)
```
No indicator = Cache miss, fresh API query made (~2-3s per route)
## Usage
### CLI Options
**Use default cache (24 hours):**
```bash
python main.py --to JFK --country DE
```
**Custom cache threshold (48 hours):**
```bash
python main.py --to JFK --country DE --cache-threshold 48
```
**Disable cache (force fresh queries):**
```bash
python main.py --to JFK --country DE --no-cache
```
### Cache Management
**View statistics:**
```bash
python cache_admin.py stats
# Output:
# Flight Search Cache Statistics
# ==================================================
# Database location: /Users/.../flight_cache.db
# Total searches cached: 42
# Total flight results: 156
# Database size: 0.15 MB
# Oldest entry: 2026-02-20 10:30:00
# Newest entry: 2026-02-21 18:55:50
```
**Clean old entries:**
```bash
# Delete entries older than 30 days
python cache_admin.py clean --days 30
# Delete entries older than 7 days
python cache_admin.py clean --days 7 --confirm
```
**Clear entire cache:**
```bash
python cache_admin.py clear-all
# ⚠️ WARNING: Requires confirmation
```
## Database Schema
### flight_searches table
```sql
CREATE TABLE flight_searches (
id INTEGER PRIMARY KEY AUTOINCREMENT,
query_hash TEXT NOT NULL UNIQUE, -- SHA256 of query params
origin TEXT NOT NULL,
destination TEXT NOT NULL,
search_date TEXT NOT NULL, -- YYYY-MM-DD
seat_class TEXT NOT NULL,
adults INTEGER NOT NULL,
query_timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
);
```
### flight_results table
```sql
CREATE TABLE flight_results (
id INTEGER PRIMARY KEY AUTOINCREMENT,
search_id INTEGER NOT NULL, -- FK to flight_searches
airline TEXT,
departure_time TEXT,
arrival_time TEXT,
duration_minutes INTEGER,
price REAL,
currency TEXT,
plane_type TEXT,
FOREIGN KEY (search_id) REFERENCES flight_searches(id) ON DELETE CASCADE
);
```
### Indexes
- `idx_query_hash` on `flight_searches(query_hash)` - Fast cache lookup
- `idx_query_timestamp` on `flight_searches(query_timestamp)` - Fast expiry checks
- `idx_search_id` on `flight_results(search_id)` - Fast result retrieval
## Benefits
### ⚡ Speed
- **Cache hit**: 0.0s (instant)
- **Cache miss**: ~2-3s (API call + save to cache)
- Example: 95 airports × 3 dates = 285 queries
- First run: ~226s (fresh API calls)
- Second run: ~0.1s (all cache hits!)
### 🛡️ Rate Limit Protection
- Prevents identical repeated queries
- Especially useful for:
- Testing and development
- Re-running seasonal scans
- Comparing different output formats
- Experimenting with sort orders
### 💰 Reduced API Load
- Fewer requests to Google Flights
- Lower risk of being rate-limited or blocked
- Respectful of Google's infrastructure
### 📊 Historical Data
- Cache preserves price snapshots over time
- Can compare prices from different query times
- Useful for tracking price trends
## Performance Example
**First Query (Cache Miss):**
```bash
$ python main.py --to BDS --country DE --window 3
# Searching 285 routes (95 airports × 3 dates)...
# Done in 226.2s
```
**Second Query (Cache Hit):**
```bash
$ python main.py --to BDS --country DE --window 3
# 💾 Cache hit: FMM->BDS on 2026-04-15 (1 flights)
# Done in 0.0s
```
**Savings:** 226.2s → 0.0s (100% cache hit rate)
## Cache Key Generation
Cache keys are SHA256 hashes of query parameters:
```python
# Example query
origin = "BER"
destination = "BRI"
date = "2026-03-23"
seat_class = "economy"
adults = 1
# Cache key
query_string = "BER|BRI|2026-03-23|economy|1"
cache_key = sha256(query_string) = "a7f3c8d2..."
```
Different parameters = different cache key:
- `BER->BRI, 2026-03-23, economy, 1``BER->BRI, 2026-03-24, economy, 1`
- `BER->BRI, 2026-03-23, economy, 1``BER->BRI, 2026-03-23, business, 1`
## Maintenance
### Recommended Cache Cleaning Schedule
**For regular users:**
```bash
# Clean monthly (keep last 30 days)
python cache_admin.py clean --days 30 --confirm
```
**For developers/testers:**
```bash
# Clean weekly (keep last 7 days)
python cache_admin.py clean --days 7 --confirm
```
**For one-time users:**
```bash
# Clear all after use
python cache_admin.py clear-all --confirm
```
### Database Growth
**Typical sizes:**
- 1 search = ~1 KB
- 100 searches = ~100 KB
- 1000 searches = ~1 MB
- 10,000 searches = ~10 MB
Most users will stay under 1 MB even with heavy use.
## Testing
**Test cache functionality:**
```bash
python test_cache.py
# Output:
# ======================================================================
# TESTING CACHE OPERATIONS
# ======================================================================
#
# 1. Clearing old cache...
# ✓ Cache cleared
# 2. Testing cache miss (first query)...
# ✓ Cache miss (as expected)
# 3. Saving flight results to cache...
# ✓ Results saved
# 4. Testing cache hit (second query)...
# ✓ Cache hit: Found 1 flight(s)
# ...
# ✅ ALL CACHE TESTS PASSED!
```
## Architecture
### Integration Points
1. **searcher_v3.py**:
- `search_direct_flights()` checks cache before API call
- Saves results after successful query
2. **main.py**:
- `--cache-threshold` CLI option
- `--no-cache` flag
- Passes cache settings to searcher
3. **cache.py**:
- `get_cached_results()`: Check for valid cached data
- `save_results()`: Store flight results
- `clear_old_cache()`: Maintenance operations
- `get_cache_stats()`: Database statistics
4. **cache_admin.py**:
- CLI for cache management
- Human-readable statistics
- Safe deletion with confirmations
## Implementation Details
### Thread Safety
SQLite handles concurrent reads automatically. Writes are serialized by SQLite's locking mechanism.
### Error Handling
- Database errors are caught and logged
- Failed cache operations fall through to API queries
- No crash on corrupted database (graceful degradation)
### Data Persistence
- Cache survives program restarts
- Located in `data/flight_cache.db`
- Can be backed up, copied, or shared
## Future Enhancements
Potential improvements:
- [ ] Cache invalidation based on flight departure time
- [ ] Compression for large result sets
- [ ] Export cache to CSV for analysis
- [ ] Cache warming (pre-populate common routes)
- [ ] Distributed cache (Redis/Memcached)
- [ ] Cache analytics (hit rate, popular routes)
## Troubleshooting
**Cache not working:**
```bash
# Check if cache module is available
python -c "import cache; print('✓ Cache available')"
# Initialize database manually
python cache_admin.py init
```
**Database locked:**
```bash
# Close all running instances
# Or delete and reinitialize
rm data/flight_cache.db
python cache_admin.py init
```
**Disk space issues:**
```bash
# Check database size
python cache_admin.py stats
# Clean aggressively
python cache_admin.py clean --days 1 --confirm
```
## Credits
Caching implementation by Claude Code, integrated with fast-flights v3.0rc1 SOCS cookie bypass.