ChatGPT Web Reverse Engineering: Building an OpenAI-Compatible API

Table of Contents

Introduction
#

In early 2023, ChatGPT was taking the world by storm, but using it came with a few clear pain points: the web experience wasn’t flexible enough to integrate with my own tools; the official API¹ charged by token, which added up quickly for heavy users; and the web and API were two completely separate systems — ChatGPT Plus subscribers couldn’t use their GPT-4 quota through the API.

So I had an idea: what if I could reverse engineer the ChatGPT web interface and wrap it into a standard OpenAI API format? That way I could use the unlimited web quota and plug it into my own toolchain.

This post documents the entire process, from packet analysis to a working implementation.

Understanding ChatGPT’s Web Architecture
#

Before diving in, I used the browser DevTools Network panel to map out the full request chain of the ChatGPT web frontend.

Authentication Chain
#

ChatGPT uses Auth0² as its OAuth2 authentication provider. The normal web login flow looks like this:

Browser → chat.openai.com/auth/login
        → Redirect to auth0.openai.com (Auth0-hosted login page)
        → User enters email and password
        → Auth0 callback returns access_token
        → Frontend stores token in session

But this flow is designed for browsers — there are multiple 302 redirects, cookie passing, and JavaScript rendering. A CLI tool can’t just follow this flow directly.

Conversation Chain
#

After logging in, the frontend sends messages to this endpoint:

POST https://chat.openai.com/backend-api/conversation
Authorization: Bearer <access_token>
Content-Type: application/json
Accept: text/event-stream

Request body:

{
  "action": "next",
  "messages": [
    {
      "id": "uuid",
      "role": "user",
      "content": { "content_type": "text", "parts": ["Hello"] }
    }
  ],
  "model": "text-davinci-002-render-sha",
  "parent_message_id": "uuid"
}

The response is SSE (Server-Sent Events)³ format, streaming token by token:

data: {"message": {"content":{"parts":["He"]}, ...}}
data: {"message": {"content":{"parts":["Hello"]}, ...}}
data: [DONE]

One key difference: the web frontend uses a message tree (each message has a parent_message_id, supporting branching and regeneration), while the OpenAI API uses a linear messages array. This difference needs to be handled during protocol conversion later.

Reverse Engineering the Auth0 Authentication Flow
#

This was the most core — and most interesting — part of the project.

Shifting Approach: From Web to iOS
#

I initially tried to simulate the browser login flow directly, but quickly hit a wall: Auth0’s login page has extensive anti-bot measures — JavaScript validation, browser fingerprinting, reCAPTCHA, you name it.

A different approach: mobile authentication flows are usually simpler than web ones. So I packet-analyzed the ChatGPT iOS client and found that it also uses Auth0, but through the OAuth2 + PKCE (Proof Key for Code Exchange)⁴ extension, which doesn’t require a browser environment.

PKCE in Brief
#

PKCE is an OAuth2 security extension, originally designed for mobile and desktop apps that can’t safely store a client_secret. The flow is straightforward:

Client generates code_verifier (random string)
Client computes code_challenge = SHA256(code_verifier)
Authorization request includes code_challenge
Callback includes the original code_verifier
Server verifies SHA256(code_verifier) == code_challenge, issues Token

The benefit: even if the authorization code is intercepted, without the code_verifier it can’t be exchanged for a token.

The Decompiled Authentication Flow
#

By analyzing the iOS client’s network requests, I broke down the complete authentication flow into 7 steps:

Get preauth_cookie
Build authorize URL with iOS client parameters
Follow authorize URL, extract state parameter and save cookies
Submit email
Submit password
Handle callback or MFA verification
If MFA is required, submit the code and go back to step 6
Finally, exchange the authorization code for an Access Token

A few noteworthy details:

Why can code_verifier be hardcoded? The iOS client can be decompiled — the code_verifier and code_challenge pair is hardcoded in the client, shared by all iOS users. In this scenario, PKCE protects the transport layer (authorization code leak doesn’t mean token leak), not the client itself.

Where does client_id come from? Also from iOS client decompilation. It’s the iOS application ID that OpenAI registered with Auth0.

Why is redirect_uri set to com.openai.chat://...? That’s an iOS URL Scheme, used by Auth0 to redirect back to the app after authorization. In our implementation, we don’t actually need to redirect — we just extract the code parameter from the response’s Location header.

The Python implementation looks roughly like this:

class Auth0:
    def auth(self, login_local=False) -> str:
        return self.__part_one() if login_local else self.get_access_token_proxy()

    def __part_one(self):     # Step 1: get preauth
    def __part_two(self):     # Step 2: build authorize URL
    def __part_three(self):   # Step 3: follow authorize
    def __part_four(self):    # Step 4: submit email
    def __part_five(self):    # Step 5: submit password
    def __part_six(self):     # Step 6: handle callback/MFA
    def __part_seven(self):   # Step 7: MFA OTP
    def get_access_token(self):  # Final: code → token

Implementing SSE Streaming Proxy
#

With the Access Token in hand, the next step is calling the ChatGPT conversation API.

Request construction is fairly intuitive: each message needs a UUID as its id, parent_message_id points to the previous message to form a conversation chain, and the first message doesn’t include conversation_id (the server creates and returns one). The action can be next (new message), variant (regenerate), or continue (continue output).

The tricky part is handling SSE responses. Python’s Flask is a synchronous framework, but SSE requires async consumption of streaming responses. My solution: async thread + blocking queue + Generator bridge:

def _request_sse(self, url, headers, data):
    queue, event = block_queue.Queue(), threading.Event()
    t = threading.Thread(target=asyncio.run,
                         args=(self._do_request_sse(url, headers, data, queue, event),))
    t.start()
    return queue.get(), queue.get(), self.__generate_wrap(queue, t, event)

Why this detour? Because httpx⁵’s streaming API is async (async with client.stream('POST', url) needs an async context), but the upper layer is synchronous (Flask route handlers, CLI readline loops are all sync), and I didn’t want to rewrite the entire architecture from Flask to aiohttp/uvicorn.

So a thread runs the async event loop, queue.Queue ferries data from the async world to the sync world, and it exposes a standard Generator to the upper layer — completely transparent.

Another detail: threading.Event is used for interruption protection. If the client disconnects and triggers GeneratorExit, the Event is set, and the async thread detects it and closes the httpx connection, preventing thread leaks.

Web API to OpenAI API Protocol Conversion
#

This is the key step of wrapping ChatGPT’s web interface into a standard OpenAI API. The two API formats differ significantly:

Dimension	ChatGPT Web API	OpenAI Public API
Authentication	Bearer access_token	Bearer sk-xxx (API Key)
Request format	Message tree (parent_message_id)	messages array
Response format	SSE + message tree nodes	SSE + choices array
Session management	Server-side conversation_id	Stateless

Request Conversion
#

My approach was to maintain a local message tree, converting the OpenAI-format messages array into a tree structure, supporting multi-turn conversations and regeneration:

def talk(self, content, model, message_id, parent_message_id, ...):
    if conversation_id:
        parent = conversation.get_prompt(parent_message_id)
    else:
        parent = conversation.add_prompt(Prompt(parent_message_id))
        parent = conversation.add_prompt(SystemPrompt(self.system_prompt, parent))

    conversation.add_prompt(UserPrompt(message_id, content, parent))
    user_prompt, gpt_prompt, messages = conversation.get_messages(message_id, model)

Response Conversion
#

The web side returns full text each time (parts[0] gets longer), while the OpenAI API returns incremental text. A delta calculation is needed:

# Web response
{"message": {"content": {"parts": ["full text"]}, "author": {"role": "assistant"}}}

# Converted to OpenAI format
data: {"choices": [{"delta": {"content": "incremental text"}, "finish_reason": null}]}
data: {"choices": [{"delta": {}, "finish_reason": "stop"}]}
data: [DONE]

Token Limit Trimming
#

The OpenAI API has token limits (4096 for gpt-3.5-turbo, 8192 for gpt-4). When conversation history gets too long, local trimming is needed:

def __reduce_messages(self, messages, model, token=None):
    max_tokens = self.FAKE_TOKENS[model] if self.__is_fake_api(token) else self.MAX_TOKENS[model]
    while gpt_num_tokens(messages) > max_tokens - 200:
        if len(messages) < 2:
            raise Exception('prompt too long')
        messages.pop(1)  # Remove from index 1, keeping system prompt and latest turns
    return messages

Trimming strategy: keep messages[0] (system prompt) and the latest few conversation turns, removing the oldest user messages first. The - 200 leaves headroom for the model’s response.

From Technical Validation to Production
#

Once the API was working, the next challenge was making it available to colleagues and friends.

Batch Registration
#

After getting the API working, I found in practice that ChatGPT has per-account rate limits — push too many requests and it starts throwing errors. The most straightforward fix: more accounts. So I built a registration bot and used my own domain and email to batch-register 200 ChatGPT accounts. Two of those got Plus subscriptions (only Plus unlocks GPT-4), with the cost split among friends. The rest ran on free GPT-3.5, perfectly fine for daily use.

Token Management and Persistence
#

Access Tokens are valid for 14 days and need to be refreshed upon expiry. I stored all account tokens in a PostgreSQL database, with a scheduled task that automatically detects expiration and batch-refreshes tokens to keep the pool always available.

Load Balancing
#

With 200 accounts, using just one would be a waste. I added a simple load balancing layer to the proxy service: on each incoming request, the service round-robins through the database to pick an available token for the ChatGPT API call. This avoids single-account rate limits and distributes request pressure evenly.

The end result: a single standard OpenAI API endpoint exposed externally. Colleagues and friends just point their applications’ API Base URL to my service, completely unaware that 200 accounts are rotating behind the scenes. GPT-4 requests route to the Plus account pool, GPT-3.5 requests to the free account pool.

References
#

The official API reference documentation from OpenAI ↩︎
The OAuth2 authentication provider used by ChatGPT ↩︎
A one-way HTTP-based data push protocol ↩︎
An OAuth2 security extension to prevent authorization code interception ↩︎
Python’s async HTTP client library ↩︎

Introduction #

Understanding ChatGPT’s Web Architecture #

Authentication Chain #

Conversation Chain #

Reverse Engineering the Auth0 Authentication Flow #

Shifting Approach: From Web to iOS #

PKCE in Brief #

The Decompiled Authentication Flow #

Implementing SSE Streaming Proxy #

Web API to OpenAI API Protocol Conversion #

Request Conversion #

Response Conversion #

Token Limit Trimming #

From Technical Validation to Production #

Batch Registration #

Token Management and Persistence #

Load Balancing #

References #