Code: github.com/punitarani/fli
Overview
Fli is a Python library that provides direct access to Google Flights' internal API through reverse engineering techniques. Unlike traditional flight search libraries that rely on web scraping methods, Fli offers structured access to flight data by interfacing directly with Google's API endpoints.
Key Technical Achievements
š Direct API Access
Fli identifies and utilizes Google Flights' internal API endpoints:
- Flight Search Endpoint:
https://www.google.com/_/FlightsFrontendUi/data/travel.frontend.flights.FlightsFrontendService/GetShoppingResults
- Date-based Price Search:
https://www.google.com/_/FlightsFrontendUi/data/travel.frontend.flights.FlightsFrontendService/GetCalendarGraph
This direct API approach eliminates the brittleness of HTML parsing and provides structured JSON responses that are much more reliable than traditional scraping methods.
š§ Complex Filter Encoding System
One of the most challenging aspects of the reverse engineering process was understanding Google's complex filter encoding system. The library implements a sophisticated encoding mechanism that converts user-friendly search parameters into Google's internal API format.
def encode(self) -> str:
"""URL encode the formatted filters for API request."""
formatted_filters = self.format()
# First convert the formatted filters to a JSON string
formatted_json = json.dumps(formatted_filters, separators=(",", ":"))
# Then wrap it in a list with null
wrapped_filters = [None, formatted_json]
# Finally, encode the whole thing
return urllib.parse.quote(json.dumps(wrapped_filters, separators=(",", ":")))
The format()
method transforms Pydantic models into deeply nested list structures that match Google's exact API expectations, handling complex scenarios like:
- Multi-segment flight searches
- Time restrictions and layover preferences
- Airline filtering and seat class selection
- Passenger configurations and pricing limits
š Browser Impersonation
To interact with Google's API, Fli implements browser impersonation using the curl_cffi
library:
response = self.client.post(
url=self.BASE_URL,
data=f"f.req={encoded_filters}",
impersonate="chrome",
allow_redirects=True,
)
This approach mimics legitimate browser requests, including:
- Proper User-Agent headers
- Chrome-like TLS fingerprinting
- Appropriate request formatting with
application/x-www-form-urlencoded
content type
ā” Rate Limiting and Reliability
The library implements robust reliability features to ensure consistent performance:
@sleep_and_retry
@limits(calls=10, period=1)
@retry(stop=stop_after_attempt(3), wait=wait_exponential(), reraise=True)
def post(self, url: str, **kwargs) -> requests.Response:
"""Make a rate-limited POST request with automatic retries."""
Key reliability features include:
- Rate Limiting: 10 requests per second to avoid overwhelming Google's servers
- Exponential Backoff: Automatic retry logic with increasing delays
- Session Management: Persistent connections for improved performance
- Error Handling: Comprehensive exception handling and logging
Technical Architecture
Type-Safe Data Models
Fli uses Pydantic models to provide a clean, type-safe interface while handling the complexity of Google's API internally:
class FlightSearchFilters(BaseModel):
"""Complete set of filters for flight search."""
trip_type: TripType = TripType.ONE_WAY
passenger_info: PassengerInfo
flight_segments: list[FlightSegment]
stops: MaxStops = MaxStops.ANY
seat_type: SeatType = SeatType.ECONOMY
price_limit: PriceLimit | None = None
airlines: list[Airline] | None = None
Modular Design
The library is organized into clean, focused modules:
- Models: Type-safe data structures for airports, airlines, and search parameters
- Search: Core search functionality with separate classes for flights and date-based searches
- Client: HTTP client with built-in reliability features
Response Parsing
The library handles Google's complex response format, extracting flight data from deeply nested JSON structures:
parsed = json.loads(response.text.lstrip(")]}'"))[0][2]
encoded_filters = json.loads(parsed)
flights_data = [
item
for i in [2, 3]
if isinstance(encoded_filters[i], list)
for item in encoded_filters[i][0]
]
Reverse Engineering Process
The development of Fli involved extensive analysis of Google Flights' frontend behavior:
- Network Traffic Analysis: Intercepting and analyzing HTTPS requests to identify API endpoints
- Request Structure Decoding: Understanding the complex parameter encoding used by Google's frontend
- Response Format Analysis: Parsing the non-standard JSON responses (prefixed with
)]}'
) - Authentication Bypass: Discovering that the API endpoints don't require traditional authentication
- Rate Limit Discovery: Testing to find optimal request rates that avoid blocking
Data Parsing and Transformation
The most complex aspect of the reverse engineering process was understanding how Google encodes flight data in their API responses. The raw response contains deeply nested arrays with no field names, requiring careful analysis to map each index to meaningful flight information.
Raw API Response Structure
Google's API returns responses with a security prefix that must be stripped:
# Raw response text starts with ")]}'" security prefix
parsed = json.loads(response.text.lstrip(")]}'"))[0][2]
The actual flight data is buried several layers deep in nested arrays:
# Extract flight data from specific array indices
flights_data = [
item
for i in [2, 3] # Flight data is stored at indices 2 and 3
if isinstance(encoded_filters[i], list)
for item in encoded_filters[i][0]
]
Data Structure Mapping
Through analysis of multiple API responses, I mapped the array indices to flight properties:
def _parse_flights_data(data: list) -> FlightResult:
"""Transform raw nested arrays into structured flight data."""
flight = FlightResult(
price=data[1][0][-1], # Price buried in nested structure
duration=data[0][9], # Total flight duration
stops=len(data[0][2]) - 1, # Number of stops = legs - 1
legs=[
FlightLeg(
airline=_parse_airline(fl[22][0]), # Airline code at index 22
flight_number=fl[22][1], # Flight number at index 22
departure_airport=_parse_airport(fl[3]), # Departure airport at index 3
arrival_airport=_parse_airport(fl[6]), # Arrival airport at index 6
departure_datetime=_parse_datetime(fl[20], fl[8]), # Date/time arrays
arrival_datetime=_parse_datetime(fl[21], fl[10]), # Date/time arrays
duration=fl[11], # Leg duration at index 11
)
for fl in data[0][2] # Each flight leg in the journey
],
)
return flight
DateTime Parsing
Google stores dates and times as separate integer arrays that must be reconstructed:
def _parse_datetime(date_arr: list[int], time_arr: list[int]) -> datetime:
"""Convert [year, month, day] and [hour, minute] arrays to datetime."""
return datetime(
*(x or 0 for x in date_arr), # [2024, 12, 15] -> year, month, day
*(x or 0 for x in time_arr) # [14, 30] -> hour, minute
)
Example Transformation
Here's how a raw API response gets transformed into readable data:
Raw API Data:
# Deeply nested array structure (simplified)
raw_flight = [
[None, None, [[...flight_legs...]], None, None, None, None, None, None, 485], # Index 9 = duration
[[None, None, None, None, None, None, None, None, None, None, None, None, 299.99]] # Price data
]
Parsed Result:
FlightResult(
price=299.99,
duration=485, # minutes
stops=0, # non-stop flight
legs=[
FlightLeg(
airline=Airline.UA,
flight_number="1234",
departure_airport=Airport.LAX,
arrival_airport=Airport.JFK,
departure_datetime=datetime(2024, 12, 15, 8, 30),
arrival_datetime=datetime(2024, 12, 15, 16, 45),
duration=485
)
]
)
This reverse engineering process required analyzing hundreds of API responses to identify consistent patterns and map the array indices to their corresponding flight data fields.
Features and Capabilities
Search Functionality
- One-way and round-trip flight searches
- Flexible departure and arrival time constraints
- Multi-airline and cabin class filtering
- Stop preferences (non-stop, one stop, etc.)
- Custom result sorting options
Price Analysis
- Date range price searches to find cheapest travel dates
- Historical price tracking capabilities
- Flexible date options for budget-conscious travelers
Developer Experience
- Clean Python API with type hints
- Comprehensive CLI interface
- Detailed documentation and examples
- Robust error handling and logging
Impact and Benefits
Fli provides several advantages over traditional web scraping approaches:
- Performance: Faster than HTML parsing methods
- Reliability: Direct API access reduces parsing failures
- Maintainability: Less dependent on UI changes
- Scalability: Built-in rate limiting and retry logic
- Developer Experience: Clean, typed Python interface
Technical Challenges Overcome
- API Discovery: Identifying the correct endpoints among Google's vast API surface
- Parameter Encoding: Reverse engineering the complex nested data structures
- Authentication: Bypassing or working around Google's security measures
- Response Parsing: Handling non-standard JSON formats and nested data
- Rate Limiting: Finding the optimal balance between speed and reliability
Future Enhancements
The reverse engineering approach opens possibilities for additional features:
- Real-time price alerts and monitoring
- Historical price trend analysis
- Multi-city trip planning
- Integration with booking platforms
- Advanced filtering and recommendation algorithms
This project demonstrates how reverse engineering techniques can be used to create developer tools and APIs, transforming a web interface into a structured Python library.