Add flight comparator web app with full scan pipeline

Full-stack flight price scanner built on fast-flights v3 (SOCS cookie bypass):

Backend (FastAPI + SQLite):
- REST API with rate limiting, Pydantic v2 validation, paginated responses
- Scan pipeline: resolves airports, queries every day in the window, saves
  individual flights + aggregate route stats to SQLite
- Background async scan processor with real-time progress tracking
- Airport search endpoint backed by OpenFlights dataset
- Daily scan window (all dates, not monthly samples)

Frontend (React 19 + TypeScript + Tailwind CSS v4):
- Dashboard with live scan status and recent scans
- Create scan form: country mode or specific airports (searchable dropdown)
- Scan detail page with expandable route rows showing individual flights
  (date, airline, departure, arrival, price) loaded on demand
- AirportSearch component with debounced live search and multi-select

Database:
- scans → routes → flights schema with FK cascade and auto-update triggers
- Migrations for schema evolution (relaxed country constraint)

Tests:
- 74 tests: unit + integration, isolated per-test SQLite DB
- Confirmed flight fixtures in tests/confirmed_flights.json (50 real flights,
  BDS→FMM Ryanair + BDS→DUS Eurowings, scraped Feb 2026)
- Integration tests parametrized from confirmed routes

Docker:
- Multi-stage builds, Compose orchestration, Nginx reverse proxy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This commit is contained in:
2026-02-26 17:11:51 +01:00
parent aea7590874
commit 6421f83ca7
67 changed files with 37173 additions and 0 deletions

295
flight-comparator/README.md Normal file
View File

@@ -0,0 +1,295 @@
# Flight Airport Comparator CLI ✈️
A Python CLI tool that helps you find the best departure airport for your destination by comparing direct flights from all airports in a country.
**✅ NOW WITH WORKING FLIGHT DATA!** Uses fast-flights v3.0rc1 with SOCS cookie integration to successfully bypass Google's consent page.
## What It Does
Answers the question: **"I want to fly to [DESTINATION]. Which airport in [COUNTRY] should I depart from — and when in the next 6 months does the best route open up?"**
### Key Features
- 🌍 **Multi-Airport Comparison**: Automatically scans all airports in a country
- 📅 **Seasonal Scanning**: Discover new routes and price trends across 6 months
-**Direct Flights Only**: Filters out connections automatically
- 🆕 **New Route Detection**: Highlights routes that appear in later months
- 🎨 **Beautiful Tables**: Rich terminal output with color and formatting
- 🚀 **Fast & Concurrent**: Parallel API requests for quick results
-**SOCS Cookie Integration**: Bypasses Google consent page for real flight data!
- 💾 **Smart Caching**: SQLite cache reduces API calls and prevents rate limiting
## Installation
```bash
# Clone or download this repository
cd flight-comparator
# Install fast-flights v3.0rc1 (REQUIRED for working flight data)
pip install --upgrade git+https://github.com/AWeirdDev/flights.git
# Install other dependencies
pip install -r requirements.txt
# Build airport database (runs automatically on first use)
python airports.py
```
### Requirements
- Python 3.10+
- **fast-flights v3.0rc1** (install from GitHub, not PyPI)
- Dependencies: click, rich, python-dateutil, primp
## Quick Test
Verify it works with real flight data:
```bash
python test_v3_with_cookies.py
```
Expected output:
```
✅ SUCCESS! Found 1 flight option(s):
1. Ryanair
Price: €89
BER → BRI
06:10 - 08:20 (130 min)
```
## Usage
### Basic Examples
**Single date query:**
```bash
python main.py --to JFK --country DE --date 2026-06-15
```
**Seasonal scan (6 months):**
```bash
python main.py --to JFK --country DE
```
**Custom airport list:**
```bash
python main.py --to JFK --from FRA,MUC,BER --date 2026-06-15
```
**Dry run (preview without API calls):**
```bash
python main.py --to JFK --country DE --dry-run
```
### All Options
```
Options:
--to TEXT Destination airport IATA code (e.g., JFK) [required]
--country TEXT Origin country ISO code (e.g., DE, US)
--date TEXT Departure date YYYY-MM-DD. Omit for seasonal scan.
--window INTEGER Months to scan in seasonal mode (default: 6)
--seat [economy|premium|business|first]
Cabin class (default: economy)
--adults INTEGER Number of passengers (default: 1)
--sort [price|duration] Sort order (default: price)
--from TEXT Comma-separated IATA codes (overrides --country)
--top INTEGER Max results per airport (default: 3)
--output [table|json|csv]
Output format (default: table)
--workers INTEGER Concurrency level (default: 5)
--dry-run List airports and dates without API calls
--help Show this message and exit.
```
### Advanced Examples
**Business class, sorted by duration:**
```bash
python main.py --to SIN --country DE --date 2026-07-20 --seat business --sort duration
```
**Seasonal scan with 12-month window:**
```bash
python main.py --to LAX --country GB --window 12
```
**Output as JSON:**
```bash
python main.py --to CDG --country NL --date 2026-05-10 --output json
```
**Force fresh queries (disable cache):**
```bash
python main.py --to JFK --country DE --no-cache
```
**Custom cache threshold (48 hours):**
```bash
python main.py --to JFK --country DE --cache-threshold 48
```
## How It Works
1. **Airport Resolution**: Loads airports for your country from the OpenFlights dataset
2. **Date Resolution**: Single date or generates monthly dates (15th of each month)
3. **Flight Search**: Queries Google Flights via fast-flights for each airport × date
4. **Filtering**: Keeps only direct flights (0 stops)
5. **Analysis**: Detects new connections in seasonal mode
6. **Formatting**: Presents results in beautiful tables, JSON, or CSV
## Seasonal Scan Mode
When you omit `--date`, the tool automatically:
- Queries one date per month (default: 15th) across the next 6 months
- Detects routes that appear in later months but not earlier ones
- Tags new connections with ✨ NEW indicator
- Helps you discover seasonal schedule changes
This is especially useful for:
- Finding when summer routes start
- Discovering new airline schedules
- Comparing price trends over time
## Country Codes
Common country codes:
- 🇩🇪 DE (Germany)
- 🇺🇸 US (United States)
- 🇬🇧 GB (United Kingdom)
- 🇫🇷 FR (France)
- 🇪🇸 ES (Spain)
- 🇮🇹 IT (Italy)
- 🇳🇱 NL (Netherlands)
- 🇦🇺 AU (Australia)
- 🇯🇵 JP (Japan)
[Full list of supported countries available in data/airports_by_country.json]
## Architecture
```
flight-comparator/
├── main.py # CLI entrypoint (Click)
├── date_resolver.py # Date logic & new connection detection
├── airports.py # Airport data management
├── searcher.py # Flight search with concurrency
├── formatter.py # Output formatting (Rich tables, JSON, CSV)
├── data/
│ └── airports_by_country.json # Generated airport database
├── tests/ # Smoke tests for each module
└── requirements.txt
```
## Caching System
The tool uses SQLite to cache flight search results, reducing API calls and preventing rate limiting.
### How It Works
- **Automatic caching**: All search results are saved to `data/flight_cache.db`
- **Cache hits**: If a query was made recently, results are retrieved instantly from cache
- **Default threshold**: 24 hours (configurable with `--cache-threshold`)
- **Cache indicator**: Shows `💾 Cache hit:` when using cached data
### Cache Management
**View cache statistics:**
```bash
python cache_admin.py stats
```
**Clean old entries (30+ days):**
```bash
python cache_admin.py clean --days 30
```
**Clear entire cache:**
```bash
python cache_admin.py clear-all
```
### CLI Options
- `--cache-threshold N`: Set cache validity in hours (default: 24)
- `--no-cache`: Force fresh API queries, ignore cache
### Benefits
-**Instant results** for repeated queries (0.0s vs 2-3s per query)
- 🛡️ **Rate limit protection**: Avoid hitting Google's API limits
- 💰 **Reduced API load**: Fewer requests = lower risk of being blocked
- 📊 **Historical data**: Cache preserves price history
## Configuration
Key constants in `date_resolver.py`:
```python
SEARCH_WINDOW_MONTHS = 6 # Default seasonal scan window
SAMPLE_DAY_OF_MONTH = 15 # Which day to query each month
```
You can override the window at runtime with `--window N`.
## Limitations
- ⚠️ Relies on fast-flights scraping Google Flights (subject to rate limits and anti-bot measures)
- ⚠️ EU users may encounter consent flow issues (use fallback mode, which is default)
- ⚠️ Prices are as shown on Google Flights, not final booking prices
- ⚠️ Seasonal scan queries only the 15th of each month as a sample
- ⚠️ Large scans (many airports × months) can take 2-3 minutes
## Performance
Single date scan:
- ~20 airports: < 30s (with --workers 5)
Seasonal scan (6 months):
- ~20 airports: 2-3 minutes
- Total requests: 120 (20 × 6)
## Testing
Run smoke tests for each module:
```bash
cd tests
python test_date_resolver.py
python test_airports.py
python test_searcher.py
python test_formatter.py
```
## Troubleshooting
**"fast-flights not installed"**
```bash
pip install fast-flights
```
**"Country code 'XX' not found"**
- Check the country code is correct (2-letter ISO code)
- Verify it exists in `data/airports_by_country.json`
**Slow performance**
- Reduce `--window` for seasonal scans
- Increase `--workers` (but watch out for rate limiting)
- Use `--from` with specific airports instead of entire country
**No results found**
- Try a different date (some routes are seasonal)
- Check the destination airport code is correct
- Verify there actually are direct flights on that route
## License
This tool is for personal use and research. Respect Google Flights' terms of service and rate limits.
## Credits
- Uses [fast-flights](https://github.com/shmuelzon/fast-flights) for Google Flights scraping
- Airport data from [OpenFlights](https://openflights.org/)
- Built with [Click](https://click.palletsprojects.com/) and [Rich](https://rich.readthedocs.io/)