Turn Google Gemini CLI Into a Standard API Proxy for Any OpenAI Client

7月15日 Published inAPI Tools

The Gemini CLI-to-API Proxy simplifies access to Google’s Gemini models by wrapping the command-line interface's OAuth flows and internal data formats within a standard REST API. If your application is already built for the OpenAI ecosystem, this proxy allows you to integrate Gemini models without rewriting your entire code stack.

The proxy operates locally, handling OAuth 2.0 authentication with Google Cloud and automatically caching credentials and project IDs. Once configured, it provides a stable interface for both streaming and non-streaming requests. You can secure the proxy using an API key, a Bearer token, or HTTP Basic Auth. To ensure the model remains as responsive as possible, default safety settings are set to BLOCK_NONE.

Key Features

Drop-in OpenAI Replacement
Configure any OpenAI-compatible client to point to this proxy. The /v1/chat/completions endpoint functions exactly as expected, allowing you to swap models while keeping your existing logic intact.

Native Gemini Endpoints
If you prefer the native Gemini API structure, you can access /v1beta/models/{model}:generateContent and its streaming equivalent directly, bypassing the OpenAI translation layer.

Full Streaming Support
The proxy supports real-time responses for both OpenAI and Gemini formats, ensuring that text streams back to the client without delay.

Multimodal Capabilities
The proxy handles multimodal inputs, allowing you to send images and text prompts simultaneously.

Flexible Authentication
You can secure your instance using several methods: Bearer tokens, HTTP Basic Auth, a query parameter (?key=), or the x-goog-api-key header.

Live Search Grounding
By appending -search to a model name (e.g., gemini-2.5-pro-search), you enable the model to use Google Search to ground its answers in real-time information.

Reasoning (Thinking) Control
For the 2.5 series of models, you can append -nothinking to bypass extended inference steps or -maxthinking to allow the model more time to process complex problems.

Containerized Deployment
The included Dockerfile makes it easy to build and deploy the proxy in any environment, including standard Docker setups or Docker Compose.

Hugging Face Spaces Compatibility
The repository is pre-configured for Hugging Face Spaces. Simply fork the repo, set your environment variables, and deploy to HF infrastructure.

Environment Variables

The proxy is configured via environment variables. Only one is strictly required:

  • GEMINI_AUTH_PASSWORD: The secret key your clients must use to access the proxy.

You also need to provide Google Cloud credentials using one of the following methods:

  • GEMINI_CREDENTIALS: A JSON string containing the complete OAuth credential set.
  • GOOGLE_APPLICATION_CREDENTIALS: A file path pointing to a credentials JSON file.
  • GOOGLE_CLOUD_PROJECT or GEMINI_PROJECT_ID: Your specific Google Cloud project ID.

The credential JSON should follow this format:

{
  "client_id": "your-client-id",
  "client_secret": "your-client-secret",
  "token": "your-access-token",
  "refresh_token": "your-refresh-token",
  "scopes": ["https://www.googleapis.com/auth/cloud-platform"],
  "token_uri": "https://oauth2.googleapis.com/token"
}

API Endpoints Summary

OpenAI-Compatible Endpoints

  • POST /v1/chat/completions: Generate a response (supports streaming).
  • GET /v1/models: Retrieve a list of available models.

Native Gemini Endpoints

  • GET /v1beta/models: List available Gemini models.
  • POST /v1beta/models/{model}:generateContent: Standard generation.
  • POST /v1beta/models/{model}:streamGenerateContent: Streaming generation.

Utility Endpoints

  • GET /health: Used for service health checks and orchestration.

Authentication Methods

Authorize your requests by passing the GEMINI_AUTH_PASSWORD through any of the following:

  • Bearer Token: Authorization: Bearer your-password
  • Basic Auth: Authorization: Basic base64(user:your-password)
  • Query String: ?key=your-password
  • Custom Header: x-goog-api-key: your-password

Deployment Guide

Using Docker

  1. Build the image:
docker build -t geminicli2api .
  1. Run the container (Port 8888):
docker run -p 8888:8888 \
  -e GEMINI_AUTH_PASSWORD=your-secret \
  -e GEMINI_CREDENTIALS='{"client_id":"...","token":"..."}' \
  -e PORT=8888 \
  geminicli2api
  1. Run for Hugging Face Spaces (Port 7860):
docker run -p 7860:7860 \
  -e GEMINI_AUTH_PASSWORD=your-secret \
  -e GEMINI_CREDENTIALS='{"client_id":"...","token":"..."}' \
  -e PORT=7860 \
  geminicli2api

Using Docker Compose

For a standard local setup on port 8888:

docker-compose up -d

For a Hugging Face-specific profile on port 7860:

docker-compose --profile hf up -d geminicli2api-hf

Deploying to Hugging Face Spaces

  1. Fork the project repository.
  2. Create a new "Space" on Hugging Face.
  3. Connect your forked repository.
  4. Add your environment variables (GEMINI_AUTH_PASSWORD and GEMINI_CREDENTIALS) in the Space settings. The Space will build automatically using the provided Dockerfile.

Usage Examples

Python: OpenAI Library Integration

import openai

client = openai.OpenAI(
    base_url="http://localhost:8888/v1",  # Adjust port if using HF (7860)
    api_key="your-password"
)

response = client.chat.completions.create(
    model="gemini-2.5-pro-maxthinking",
    messages=[
        {"role": "user", "content": "Explain relativity in simple terms."}
    ],
    stream=True
)

for chunk in response:
    # Check for reasoning/thinking steps if available
    if chunk.choices[0].delta.reasoning_content:
        print(f"Thinking: {chunk.choices[0].delta.reasoning_content}")
    # Print the actual content
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Python: Native Gemini API via Requests

import requests

headers = {
    "Authorization": "Bearer your-password",
    "Content-Type": "application/json"
}

payload = {
    "contents": [
        {
            "role": "user",
            "parts": [{"text": "Explain relativity in simple terms."}]
        }
    ],
    "thinkingConfig": {
        "thinkingBudget": 32768,
        "includeThoughts": True
    }
}

resp = requests.post(
    "http://localhost:8888/v1beta/models/gemini-2.5-pro:generateContent",
    headers=headers,
    json=payload
)

print(resp.json())

Supported Models

Base Models

  • gemini-2.5-pro
  • gemini-2.5-flash
  • gemini-1.5-pro
  • gemini-1.5-flash
  • gemini-1.0-pro

Automatic Model Suffixes
The proxy dynamically generates variants for the 2.5 series based on the following suffixes:

  • -search: Activates Google Search grounding.
  • -nothinking: Minimizes the reasoning budget for faster responses.
  • -maxthinking: Maximizes the reasoning budget for deep processing.

For instance, using gemini-2.5-flash-maxthinking tells the model to prioritize detailed reasoning before delivering an answer.