The Helicone AI Gateway is a high-performance, lightweight AI proxy developed by the Helicone team and released as an open-source project. Its minimal footprint and straightforward configuration allow it to manage heavy production throughput with ease.
A single endpoint provides access to more than 100 different models. Built with Rust for maximum efficiency, the gateway remains responsive even when processing millions of LLM requests. It serves a similar role to NGINX, but is engineered specifically for the modern AI infrastructure stack.
Installation & Configuration
.env file:OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
npx @helicone/ai-gateway@latest
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/ai",
api_key="placeholder-api-key" # The gateway manages the actual keys securely
)
# Use one interface for any LLM provider. The gateway handles the routing.
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet", # Compatible with 100+ other models
messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)
This setup requires no new SDKs to learn and avoids complex integrations. It is a fully open and functional solution for multi-model management.
Why Run Helicone AI Gateway?
Unified Interface
Maintain your existing OpenAI syntax while calling Anthropic, Google, AWS Bedrock, or any of the 20+ supported providers. You can switch models without rewriting your integration logic.
Intelligent Provider Selection
Optimize your requests based on speed, cost, or reliability. The gateway supports advanced load-balancing strategies, including latency-aware P2C with PeakEWMA, weighted distribution, and cost-based routing. It also tracks provider health and rate limits in real-time.
Cost Guardrails
Prevent budget overruns and usage abuse with robust rate limiting. You can define caps for specific users, teams, or the entire organization based on request volume, token consumption, or total spend.
Performance Gains
Response caching can reduce latency and API costs by up to 95%. The system supports Redis or S3 backends and includes built-in smart invalidation logic.
Simplified Observability
The gateway comes pre-configured for Helicone and offers full OpenTelemetry support. This provides immediate access to logs, metrics, and traces for debugging and performance monitoring.
Rapid Deployment
Run the gateway as a single binary or a Docker container on your own infrastructure. You can be live in seconds by following the standard deployment guide.
Production-Ready Throughput
| Metric | Helicone AI Gateway | Typical Setup |
|---|---|---|
| P95 Latency | <10ms | ~60-100ms |
| Memory Usage | ~64MB | ~512MB |
| Requests per Second | ~2,000 | ~500 |
| Binary Size | ~15MB | ~200MB |
| Cold Start Time | ~100ms | ~2s |
Note: These are preliminary figures. Full benchmark methodology and detailed results are available in benchmarks/README.md.
How It Works
┌────────────┐ ┌─────────────┐ ┌──────────────┐
│ Your App │───▶│ Helicone AI │───▶│ LLM Providers│
│ │ │ Gateway │ │ │
│ OpenAI SDK │ │ │ │ • OpenAI │
│ (any lang) │ │ • Load Bal. │ │ • Anthropic │
│ │ │ • Rate Limit│ │ • AWS Bedrock│
│ │ │ • Caching │ │ • Google Vertex│
│ │ │ • Tracing │ │ • 20+ more │
└────────────┘ └─────────────┘ └──────────────┘
│
▼
┌─────────────────┐
│ Helicone │
│ Observability │
│ │
│ • Dashboards │
│ • Metrics │
│ • Monitoring │
│ • Debugging │
└─────────────────┘
Custom Configuration
.env file:OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
HELICONE_API_KEY=sk-...
config.yaml. For a complete list of options, refer to the configuration guide and the supported provider list.helicone: # Define HELICONE_API_KEY in your .env
observability: true
authentication: true
cache-store:
in-memory: {}
global: # Applied across all routers
cache:
directive: "max-age=3600, max-stale=1800"
routers:
your-router-name: # Specific settings for this router
load-balance:
chat:
strategy: latency
targets:
- openai
- anthropic
rate-limit:
per-api-key:
capacity: 1000
refill-frequency: 1m # Allows 1000 requests per minute
npx @helicone/ai-gateway@latest --config config.yaml
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/router/your-router-name",
api_key="placeholder-api-key" # The gateway handles the actual provider keys
)
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet", # Or any other supported model
messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)
Migration Guide
From OpenAI (Python)
from openai import OpenAI
client = OpenAI(
- api_key=os.getenv("OPENAI_API_KEY")
+ api_key="placeholder-api-key" # Handled by the gateway
+ base_url="http://localhost:8080/router/your-router-name"
)
# The rest of your code remains exactly the same.
response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello!"}]
)
From OpenAI (TypeScript)
import { OpenAI } from "openai";
const client = new OpenAI({
- apiKey: os.getenv("OPENAI_API_KEY")
+ apiKey: "placeholder-api-key", // Handled by the gateway
+ baseURL: "http://localhost:8080/router/your-router-name",
});
const response = await client.chat.completions.create({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Hello from Helicone AI Gateway!" }],
});
Mantis: A Smarter Vision-Language-Action Model for Robots
Octo: A Zero-Telemetry Coding Assistant with Smart Auto-Repair
Web Codegen Scorer: Test AI-Generated Web Code Quality Before You Ship
Magic: An Open-Source AI Productivity Platform with Agent Automation
HackGPT Enterprise Review: AI-Native Pentesting for Security Teams
Akaunting Review: Free Open-Source Accounting Software for Small Business
Fooocus: Free Offline SDXL Image Generator & Installation Guide
ThinkChain: Stream Claude's Reasoning with Local Tools and MCP
ACI.dev: 600+ Tools for AI Agents with Built-In Auth and MCP Support
Lapce: A Fast, Rust-Powered Code Editor with Remote Development
sherpa-onnx: Offline Speech Recognition, TTS, and VAD Without the Cloud
Liebao VPN: Download, Install & Use on Android & iOS