Fli - Reverse Engineering Google Flights API

15 December 2024

Code: github.com/punitarani/fli

Overview

Fli is a Python library that provides direct access to Google Flights' internal API through reverse engineering techniques. Unlike traditional flight search libraries that rely on web scraping methods, Fli offers structured access to flight data by interfacing directly with Google's API endpoints.

Fli CLI Demo

Key Technical Achievements

šŸš€ Direct API Access

Fli identifies and utilizes Google Flights' internal API endpoints:

This direct API approach eliminates the brittleness of HTML parsing and provides structured JSON responses that are much more reliable than traditional scraping methods.

šŸ”§ Complex Filter Encoding System

One of the most challenging aspects of the reverse engineering process was understanding Google's complex filter encoding system. The library implements a sophisticated encoding mechanism that converts user-friendly search parameters into Google's internal API format.

def encode(self) -> str:
    """URL encode the formatted filters for API request."""
    formatted_filters = self.format()
    # First convert the formatted filters to a JSON string
    formatted_json = json.dumps(formatted_filters, separators=(",", ":"))
    # Then wrap it in a list with null
    wrapped_filters = [None, formatted_json]
    # Finally, encode the whole thing
    return urllib.parse.quote(json.dumps(wrapped_filters, separators=(",", ":")))

The format() method transforms Pydantic models into deeply nested list structures that match Google's exact API expectations, handling complex scenarios like:

šŸŽ­ Browser Impersonation

To interact with Google's API, Fli implements browser impersonation using the curl_cffi library:

response = self.client.post(
    url=self.BASE_URL,
    data=f"f.req={encoded_filters}",
    impersonate="chrome",
    allow_redirects=True,
)

This approach mimics legitimate browser requests, including:

⚔ Rate Limiting and Reliability

The library implements robust reliability features to ensure consistent performance:

@sleep_and_retry
@limits(calls=10, period=1)
@retry(stop=stop_after_attempt(3), wait=wait_exponential(), reraise=True)
def post(self, url: str, **kwargs) -> requests.Response:
    """Make a rate-limited POST request with automatic retries."""

Key reliability features include:

Technical Architecture

Type-Safe Data Models

Fli uses Pydantic models to provide a clean, type-safe interface while handling the complexity of Google's API internally:

class FlightSearchFilters(BaseModel):
    """Complete set of filters for flight search."""
    
    trip_type: TripType = TripType.ONE_WAY
    passenger_info: PassengerInfo
    flight_segments: list[FlightSegment]
    stops: MaxStops = MaxStops.ANY
    seat_type: SeatType = SeatType.ECONOMY
    price_limit: PriceLimit | None = None
    airlines: list[Airline] | None = None

Modular Design

The library is organized into clean, focused modules:

Response Parsing

The library handles Google's complex response format, extracting flight data from deeply nested JSON structures:

parsed = json.loads(response.text.lstrip(")]}'"))[0][2]
encoded_filters = json.loads(parsed)
flights_data = [
    item
    for i in [2, 3]
    if isinstance(encoded_filters[i], list)
    for item in encoded_filters[i][0]
]

Reverse Engineering Process

The development of Fli involved extensive analysis of Google Flights' frontend behavior:

  1. Network Traffic Analysis: Intercepting and analyzing HTTPS requests to identify API endpoints
  2. Request Structure Decoding: Understanding the complex parameter encoding used by Google's frontend
  3. Response Format Analysis: Parsing the non-standard JSON responses (prefixed with )]}')
  4. Authentication Bypass: Discovering that the API endpoints don't require traditional authentication
  5. Rate Limit Discovery: Testing to find optimal request rates that avoid blocking

Data Parsing and Transformation

The most complex aspect of the reverse engineering process was understanding how Google encodes flight data in their API responses. The raw response contains deeply nested arrays with no field names, requiring careful analysis to map each index to meaningful flight information.

Raw API Response Structure

Google's API returns responses with a security prefix that must be stripped:

# Raw response text starts with ")]}'" security prefix
parsed = json.loads(response.text.lstrip(")]}'"))[0][2]

The actual flight data is buried several layers deep in nested arrays:

# Extract flight data from specific array indices
flights_data = [
    item
    for i in [2, 3]  # Flight data is stored at indices 2 and 3
    if isinstance(encoded_filters[i], list)
    for item in encoded_filters[i][0]
]

Data Structure Mapping

Through analysis of multiple API responses, I mapped the array indices to flight properties:

def _parse_flights_data(data: list) -> FlightResult:
    """Transform raw nested arrays into structured flight data."""
    flight = FlightResult(
        price=data[1][0][-1],           # Price buried in nested structure
        duration=data[0][9],            # Total flight duration
        stops=len(data[0][2]) - 1,      # Number of stops = legs - 1
        legs=[
            FlightLeg(
                airline=_parse_airline(fl[22][0]),           # Airline code at index 22
                flight_number=fl[22][1],                     # Flight number at index 22
                departure_airport=_parse_airport(fl[3]),     # Departure airport at index 3
                arrival_airport=_parse_airport(fl[6]),       # Arrival airport at index 6
                departure_datetime=_parse_datetime(fl[20], fl[8]),  # Date/time arrays
                arrival_datetime=_parse_datetime(fl[21], fl[10]),   # Date/time arrays
                duration=fl[11],                             # Leg duration at index 11
            )
            for fl in data[0][2]  # Each flight leg in the journey
        ],
    )
    return flight

DateTime Parsing

Google stores dates and times as separate integer arrays that must be reconstructed:

def _parse_datetime(date_arr: list[int], time_arr: list[int]) -> datetime:
    """Convert [year, month, day] and [hour, minute] arrays to datetime."""
    return datetime(
        *(x or 0 for x in date_arr),  # [2024, 12, 15] -> year, month, day
        *(x or 0 for x in time_arr)   # [14, 30] -> hour, minute
    )

Example Transformation

Here's how a raw API response gets transformed into readable data:

Raw API Data:

# Deeply nested array structure (simplified)
raw_flight = [
    [None, None, [[...flight_legs...]], None, None, None, None, None, None, 485],  # Index 9 = duration
    [[None, None, None, None, None, None, None, None, None, None, None, None, 299.99]]  # Price data
]

Parsed Result:

FlightResult(
    price=299.99,
    duration=485,  # minutes
    stops=0,       # non-stop flight
    legs=[
        FlightLeg(
            airline=Airline.UA,
            flight_number="1234",
            departure_airport=Airport.LAX,
            arrival_airport=Airport.JFK,
            departure_datetime=datetime(2024, 12, 15, 8, 30),
            arrival_datetime=datetime(2024, 12, 15, 16, 45),
            duration=485
        )
    ]
)

This reverse engineering process required analyzing hundreds of API responses to identify consistent patterns and map the array indices to their corresponding flight data fields.

Features and Capabilities

Search Functionality

Price Analysis

Developer Experience

Impact and Benefits

Fli provides several advantages over traditional web scraping approaches:

Technical Challenges Overcome

  1. API Discovery: Identifying the correct endpoints among Google's vast API surface
  2. Parameter Encoding: Reverse engineering the complex nested data structures
  3. Authentication: Bypassing or working around Google's security measures
  4. Response Parsing: Handling non-standard JSON formats and nested data
  5. Rate Limiting: Finding the optimal balance between speed and reliability

Future Enhancements

The reverse engineering approach opens possibilities for additional features:


This project demonstrates how reverse engineering techniques can be used to create developer tools and APIs, transforming a web interface into a structured Python library.