Files
domverse 6421f83ca7 Add flight comparator web app with full scan pipeline
Full-stack flight price scanner built on fast-flights v3 (SOCS cookie bypass):

Backend (FastAPI + SQLite):
- REST API with rate limiting, Pydantic v2 validation, paginated responses
- Scan pipeline: resolves airports, queries every day in the window, saves
  individual flights + aggregate route stats to SQLite
- Background async scan processor with real-time progress tracking
- Airport search endpoint backed by OpenFlights dataset
- Daily scan window (all dates, not monthly samples)

Frontend (React 19 + TypeScript + Tailwind CSS v4):
- Dashboard with live scan status and recent scans
- Create scan form: country mode or specific airports (searchable dropdown)
- Scan detail page with expandable route rows showing individual flights
  (date, airline, departure, arrival, price) loaded on demand
- AirportSearch component with debounced live search and multi-select

Database:
- scans → routes → flights schema with FK cascade and auto-update triggers
- Migrations for schema evolution (relaxed country constraint)

Tests:
- 74 tests: unit + integration, isolated per-test SQLite DB
- Confirmed flight fixtures in tests/confirmed_flights.json (50 real flights,
  BDS→FMM Ryanair + BDS→DUS Eurowings, scraped Feb 2026)
- Integration tests parametrized from confirmed routes

Docker:
- Multi-stage builds, Compose orchestration, Nginx reverse proxy

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-26 17:11:51 +01:00

296 lines
8.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Flight Airport Comparator CLI ✈️
A Python CLI tool that helps you find the best departure airport for your destination by comparing direct flights from all airports in a country.
**✅ NOW WITH WORKING FLIGHT DATA!** Uses fast-flights v3.0rc1 with SOCS cookie integration to successfully bypass Google's consent page.
## What It Does
Answers the question: **"I want to fly to [DESTINATION]. Which airport in [COUNTRY] should I depart from — and when in the next 6 months does the best route open up?"**
### Key Features
- 🌍 **Multi-Airport Comparison**: Automatically scans all airports in a country
- 📅 **Seasonal Scanning**: Discover new routes and price trends across 6 months
-**Direct Flights Only**: Filters out connections automatically
- 🆕 **New Route Detection**: Highlights routes that appear in later months
- 🎨 **Beautiful Tables**: Rich terminal output with color and formatting
- 🚀 **Fast & Concurrent**: Parallel API requests for quick results
-**SOCS Cookie Integration**: Bypasses Google consent page for real flight data!
- 💾 **Smart Caching**: SQLite cache reduces API calls and prevents rate limiting
## Installation
```bash
# Clone or download this repository
cd flight-comparator
# Install fast-flights v3.0rc1 (REQUIRED for working flight data)
pip install --upgrade git+https://github.com/AWeirdDev/flights.git
# Install other dependencies
pip install -r requirements.txt
# Build airport database (runs automatically on first use)
python airports.py
```
### Requirements
- Python 3.10+
- **fast-flights v3.0rc1** (install from GitHub, not PyPI)
- Dependencies: click, rich, python-dateutil, primp
## Quick Test
Verify it works with real flight data:
```bash
python test_v3_with_cookies.py
```
Expected output:
```
✅ SUCCESS! Found 1 flight option(s):
1. Ryanair
Price: €89
BER → BRI
06:10 - 08:20 (130 min)
```
## Usage
### Basic Examples
**Single date query:**
```bash
python main.py --to JFK --country DE --date 2026-06-15
```
**Seasonal scan (6 months):**
```bash
python main.py --to JFK --country DE
```
**Custom airport list:**
```bash
python main.py --to JFK --from FRA,MUC,BER --date 2026-06-15
```
**Dry run (preview without API calls):**
```bash
python main.py --to JFK --country DE --dry-run
```
### All Options
```
Options:
--to TEXT Destination airport IATA code (e.g., JFK) [required]
--country TEXT Origin country ISO code (e.g., DE, US)
--date TEXT Departure date YYYY-MM-DD. Omit for seasonal scan.
--window INTEGER Months to scan in seasonal mode (default: 6)
--seat [economy|premium|business|first]
Cabin class (default: economy)
--adults INTEGER Number of passengers (default: 1)
--sort [price|duration] Sort order (default: price)
--from TEXT Comma-separated IATA codes (overrides --country)
--top INTEGER Max results per airport (default: 3)
--output [table|json|csv]
Output format (default: table)
--workers INTEGER Concurrency level (default: 5)
--dry-run List airports and dates without API calls
--help Show this message and exit.
```
### Advanced Examples
**Business class, sorted by duration:**
```bash
python main.py --to SIN --country DE --date 2026-07-20 --seat business --sort duration
```
**Seasonal scan with 12-month window:**
```bash
python main.py --to LAX --country GB --window 12
```
**Output as JSON:**
```bash
python main.py --to CDG --country NL --date 2026-05-10 --output json
```
**Force fresh queries (disable cache):**
```bash
python main.py --to JFK --country DE --no-cache
```
**Custom cache threshold (48 hours):**
```bash
python main.py --to JFK --country DE --cache-threshold 48
```
## How It Works
1. **Airport Resolution**: Loads airports for your country from the OpenFlights dataset
2. **Date Resolution**: Single date or generates monthly dates (15th of each month)
3. **Flight Search**: Queries Google Flights via fast-flights for each airport × date
4. **Filtering**: Keeps only direct flights (0 stops)
5. **Analysis**: Detects new connections in seasonal mode
6. **Formatting**: Presents results in beautiful tables, JSON, or CSV
## Seasonal Scan Mode
When you omit `--date`, the tool automatically:
- Queries one date per month (default: 15th) across the next 6 months
- Detects routes that appear in later months but not earlier ones
- Tags new connections with ✨ NEW indicator
- Helps you discover seasonal schedule changes
This is especially useful for:
- Finding when summer routes start
- Discovering new airline schedules
- Comparing price trends over time
## Country Codes
Common country codes:
- 🇩🇪 DE (Germany)
- 🇺🇸 US (United States)
- 🇬🇧 GB (United Kingdom)
- 🇫🇷 FR (France)
- 🇪🇸 ES (Spain)
- 🇮🇹 IT (Italy)
- 🇳🇱 NL (Netherlands)
- 🇦🇺 AU (Australia)
- 🇯🇵 JP (Japan)
[Full list of supported countries available in data/airports_by_country.json]
## Architecture
```
flight-comparator/
├── main.py # CLI entrypoint (Click)
├── date_resolver.py # Date logic & new connection detection
├── airports.py # Airport data management
├── searcher.py # Flight search with concurrency
├── formatter.py # Output formatting (Rich tables, JSON, CSV)
├── data/
│ └── airports_by_country.json # Generated airport database
├── tests/ # Smoke tests for each module
└── requirements.txt
```
## Caching System
The tool uses SQLite to cache flight search results, reducing API calls and preventing rate limiting.
### How It Works
- **Automatic caching**: All search results are saved to `data/flight_cache.db`
- **Cache hits**: If a query was made recently, results are retrieved instantly from cache
- **Default threshold**: 24 hours (configurable with `--cache-threshold`)
- **Cache indicator**: Shows `💾 Cache hit:` when using cached data
### Cache Management
**View cache statistics:**
```bash
python cache_admin.py stats
```
**Clean old entries (30+ days):**
```bash
python cache_admin.py clean --days 30
```
**Clear entire cache:**
```bash
python cache_admin.py clear-all
```
### CLI Options
- `--cache-threshold N`: Set cache validity in hours (default: 24)
- `--no-cache`: Force fresh API queries, ignore cache
### Benefits
-**Instant results** for repeated queries (0.0s vs 2-3s per query)
- 🛡️ **Rate limit protection**: Avoid hitting Google's API limits
- 💰 **Reduced API load**: Fewer requests = lower risk of being blocked
- 📊 **Historical data**: Cache preserves price history
## Configuration
Key constants in `date_resolver.py`:
```python
SEARCH_WINDOW_MONTHS = 6 # Default seasonal scan window
SAMPLE_DAY_OF_MONTH = 15 # Which day to query each month
```
You can override the window at runtime with `--window N`.
## Limitations
- ⚠️ Relies on fast-flights scraping Google Flights (subject to rate limits and anti-bot measures)
- ⚠️ EU users may encounter consent flow issues (use fallback mode, which is default)
- ⚠️ Prices are as shown on Google Flights, not final booking prices
- ⚠️ Seasonal scan queries only the 15th of each month as a sample
- ⚠️ Large scans (many airports × months) can take 2-3 minutes
## Performance
Single date scan:
- ~20 airports: < 30s (with --workers 5)
Seasonal scan (6 months):
- ~20 airports: 2-3 minutes
- Total requests: 120 (20 × 6)
## Testing
Run smoke tests for each module:
```bash
cd tests
python test_date_resolver.py
python test_airports.py
python test_searcher.py
python test_formatter.py
```
## Troubleshooting
**"fast-flights not installed"**
```bash
pip install fast-flights
```
**"Country code 'XX' not found"**
- Check the country code is correct (2-letter ISO code)
- Verify it exists in `data/airports_by_country.json`
**Slow performance**
- Reduce `--window` for seasonal scans
- Increase `--workers` (but watch out for rate limiting)
- Use `--from` with specific airports instead of entire country
**No results found**
- Try a different date (some routes are seasonal)
- Check the destination airport code is correct
- Verify there actually are direct flights on that route
## License
This tool is for personal use and research. Respect Google Flights' terms of service and rate limits.
## Credits
- Uses [fast-flights](https://github.com/shmuelzon/fast-flights) for Google Flights scraping
- Airport data from [OpenFlights](https://openflights.org/)
- Built with [Click](https://click.palletsprojects.com/) and [Rich](https://rich.readthedocs.io/)