The Gemini CLI-to-API Proxy simplifies access to Google’s Gemini models by wrapping the command-line interface's OAuth flows and internal data formats within a standard REST API. If your application is already built for the OpenAI ecosystem, this proxy allows you to integrate Gemini models without rewriting your entire code stack.
The proxy operates locally, handling OAuth 2.0 authentication with Google Cloud and automatically caching credentials and project IDs. Once configured, it provides a stable interface for both streaming and non-streaming requests. You can secure the proxy using an API key, a Bearer token, or HTTP Basic Auth. To ensure the model remains as responsive as possible, default safety settings are set to BLOCK_NONE.
Drop-in OpenAI Replacement
Configure any OpenAI-compatible client to point to this proxy. The /v1/chat/completions endpoint functions exactly as expected, allowing you to swap models while keeping your existing logic intact.
Native Gemini Endpoints
If you prefer the native Gemini API structure, you can access /v1beta/models/{model}:generateContent and its streaming equivalent directly, bypassing the OpenAI translation layer.
Full Streaming Support
The proxy supports real-time responses for both OpenAI and Gemini formats, ensuring that text streams back to the client without delay.
Multimodal Capabilities
The proxy handles multimodal inputs, allowing you to send images and text prompts simultaneously.
Flexible Authentication
You can secure your instance using several methods: Bearer tokens, HTTP Basic Auth, a query parameter (?key=), or the x-goog-api-key header.
Live Search Grounding
By appending -search to a model name (e.g., gemini-2.5-pro-search), you enable the model to use Google Search to ground its answers in real-time information.
Reasoning (Thinking) Control
For the 2.5 series of models, you can append -nothinking to bypass extended inference steps or -maxthinking to allow the model more time to process complex problems.
Containerized Deployment
The included Dockerfile makes it easy to build and deploy the proxy in any environment, including standard Docker setups or Docker Compose.
Hugging Face Spaces Compatibility
The repository is pre-configured for Hugging Face Spaces. Simply fork the repo, set your environment variables, and deploy to HF infrastructure.
The proxy is configured via environment variables. Only one is strictly required:
GEMINI_AUTH_PASSWORD: The secret key your clients must use to access the proxy.You also need to provide Google Cloud credentials using one of the following methods:
GEMINI_CREDENTIALS: A JSON string containing the complete OAuth credential set.GOOGLE_APPLICATION_CREDENTIALS: A file path pointing to a credentials JSON file.GOOGLE_CLOUD_PROJECT or GEMINI_PROJECT_ID: Your specific Google Cloud project ID.The credential JSON should follow this format:
{
"client_id": "your-client-id",
"client_secret": "your-client-secret",
"token": "your-access-token",
"refresh_token": "your-refresh-token",
"scopes": ["https://www.googleapis.com/auth/cloud-platform"],
"token_uri": "https://oauth2.googleapis.com/token"
}
POST /v1/chat/completions: Generate a response (supports streaming).GET /v1/models: Retrieve a list of available models.GET /v1beta/models: List available Gemini models.POST /v1beta/models/{model}:generateContent: Standard generation.POST /v1beta/models/{model}:streamGenerateContent: Streaming generation.GET /health: Used for service health checks and orchestration.Authorize your requests by passing the GEMINI_AUTH_PASSWORD through any of the following:
Authorization: Bearer your-passwordAuthorization: Basic base64(user:your-password)?key=your-passwordx-goog-api-key: your-passworddocker build -t geminicli2api .
docker run -p 8888:8888 \
-e GEMINI_AUTH_PASSWORD=your-secret \
-e GEMINI_CREDENTIALS='{"client_id":"...","token":"..."}' \
-e PORT=8888 \
geminicli2api
docker run -p 7860:7860 \
-e GEMINI_AUTH_PASSWORD=your-secret \
-e GEMINI_CREDENTIALS='{"client_id":"...","token":"..."}' \
-e PORT=7860 \
geminicli2api
For a standard local setup on port 8888:
docker-compose up -d
For a Hugging Face-specific profile on port 7860:
docker-compose --profile hf up -d geminicli2api-hf
GEMINI_AUTH_PASSWORD and GEMINI_CREDENTIALS) in the Space settings. The Space will build automatically using the provided Dockerfile.import openai
client = openai.OpenAI(
base_url="http://localhost:8888/v1", # Adjust port if using HF (7860)
api_key="your-password"
)
response = client.chat.completions.create(
model="gemini-2.5-pro-maxthinking",
messages=[
{"role": "user", "content": "Explain relativity in simple terms."}
],
stream=True
)
for chunk in response:
# Check for reasoning/thinking steps if available
if chunk.choices[0].delta.reasoning_content:
print(f"Thinking: {chunk.choices[0].delta.reasoning_content}")
# Print the actual content
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")
import requests
headers = {
"Authorization": "Bearer your-password",
"Content-Type": "application/json"
}
payload = {
"contents": [
{
"role": "user",
"parts": [{"text": "Explain relativity in simple terms."}]
}
],
"thinkingConfig": {
"thinkingBudget": 32768,
"includeThoughts": True
}
}
resp = requests.post(
"http://localhost:8888/v1beta/models/gemini-2.5-pro:generateContent",
headers=headers,
json=payload
)
print(resp.json())
Base Models
gemini-2.5-progemini-2.5-flashgemini-1.5-progemini-1.5-flashgemini-1.0-proAutomatic Model Suffixes
The proxy dynamically generates variants for the 2.5 series based on the following suffixes:
-search: Activates Google Search grounding.-nothinking: Minimizes the reasoning budget for faster responses.-maxthinking: Maximizes the reasoning budget for deep processing.For instance, using gemini-2.5-flash-maxthinking tells the model to prioritize detailed reasoning before delivering an answer.
MOSS-Speech: Real Voice-to-Voice AI Without Text Bottlenecks
Tongyi DeepResearch: 30B Agent Model Beats GPT and Claude on Search Benchmarks
Shanlian VPN Review: High-Speed, Private & Optimized for China
Parlant: Build AI Agents That Follow Rules, Not Prompts
Any-LLM Review: A Unified Python Interface for Every AI Model
BuildAdmin: Vue 3 + ThinkPHP 8 Admin Panel with CRUD Generator
Zettlr Setup and Developer Guide (macOS, Windows, Linux)
TypeAgent: Build AI Agents With Structured Memory and Human-in-the-Loop
Jitsi Meet Review: Open-Source Video Conferencing That Just Works
AgentCPM-GUI: A Local LLM Agent for Navigating Chinese Mobile Apps
MM-Wiki: A Lightweight Enterprise Wiki & Team Collaboration Tool
LiebaoVPN: Fast, Private, and Ad-Free – The Top VPN for 2025