Every LLM call, optimally routed.
Intelligent routing across 114+ models from 8 providers. Zero markup. Built-in PII detection. Cut your AI spend by up to 97%.
14-day free trial for paid plans. No credit card required. Cancel anytime.
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "Analyze quarterly revenue..."}],
optimize="cost",
)Analysis
Candidates
Request Log
| Time | Model | Provider | Tokens | Cost (¢) | Latency | Throughput | Status |
|---|---|---|---|---|---|---|---|
2s ago 3:42:18 PM | gemini-2.5-pro | 1,847 1,218 in / 629 out | 0.7813 | 1,247ms | 1,481tok/s | OK |
req_7f3a...k2m1Good
Request Performance
1247ms
1481 tok/s
Cost Breakdown
Per 1K tokens
$0.00423
Tokens per cent
2,364
Cost at Scale
1K reqs
$7.81
10K reqs
$78.13
100K reqs
$781.30
Up to 97%
Cost Reduction
<20ms
Gateway Overhead (p50)
8
AI Providers Supported
114+
Models With Full Scoring
USE YOUR EXISTING API KEYS FROM ALL MAJOR PROVIDERS
Everything you need to ship AI at scale
Drop-in replacement for OpenAI SDK. Change the base URL and get intelligent routing, caching, compliance, and cost visibility—instantly.
Intelligent Routing
Multi-factor scoring across 80+ models. Heuristic, embedding-based, and use-case analysis picks the optimal model automatically.
Learn moreSemantic Caching
ONNX local embeddings match similar queries to cached responses. Cache hits cost $0 with instant response times.
Learn moreCost Tracking & Attribution
Per-request cost calculation with custom tracking IDs. Attribute LLM costs to tenants, users, or projects for SaaS billing.
Learn morePII/PHI Protection
Detect and mask emails, SSNs, credit cards, medical records, and custom patterns before they reach AI models.
Learn moreBudget Controls
Set spending limits per organization, provider, or project. Hard limits block requests. Soft limits trigger alerts.
Learn moreGuardrails Audit Log
Full audit trail of all PII/PHI detections with HMAC-hashed matched text. Meet compliance requirements.
Learn moreAutomatic Failover
Circuit breaker pattern with health metrics. Retries across providers on transient errors in under 100ms.
Learn moreSLO Monitoring
Track uptime, P50/P95/P99 latency, and error rates against configurable targets. Get violation alerts.
Learn moreTOON Compression
JSON-to-binary token optimization reduces payload size. Per-project toggle with compression metrics.
Learn moreBYOK (Zero Markup)
Bring your own API keys with AES-256-GCM encryption. Pay providers directly. Your rate limits and enterprise discounts stay intact.
Learn moreWebhooks & Alerts
Send real-time alerts to Slack, Discord, or custom HTTP endpoints. Filter by event type and severity.
Learn moreTeam & RBAC
Multi-organization support with Owner, Admin, Member, and Viewer roles. Invite by email or shareable link.
Learn moreGet started in 2 minutes
No SDK changes. No code refactoring. Just point your existing OpenAI client to Costbase.
Create API Key
Sign up and generate your Costbase API key from the dashboard.
Change Base URL
Update your OpenAI SDK base URL to point to Costbase gateway.
Watch Savings Grow
Intelligence kicks in immediately. Track savings in real-time.
import OpenAI from 'openai';
// Just change the baseURL - intelligence takes over
const client = new OpenAI({
apiKey: 'YOUR_COSTBASE_API_KEY',
baseURL: 'https://api.costbase.ai/v1',
});
// Set model: "auto" for intelligent routing
const response = await client.chat.completions.create({
model: 'auto', // We pick the optimal model
messages: [{ role: 'user', content: 'Explain quantum computing' }],
optimize: 'cost', // Cost, quality, or speed
});
// Response includes cost transparency
console.log(response.costbase);
// → { model_used: "claude-3-5-haiku", saved_vs_baseline: "$0.003" }Calculate Your Savings
See how much you could save with intelligent routing and semantic caching
Routing Optimization
40% average savings
-$2,000/month
Semantic Caching
20% average savings
-$1,000/month
Annual savings: $36,000
That's 60% reduction in your LLM costs
Stop Overpaying for LLM APIs
Most teams overpay because they use expensive models for every query, can't cache effectively, and have no visibility into spend.
Costbase fixes all of this. Choose your deployment.
Managed Cloud
Fastest way to start
- 14-day free trial on any plan
- No infrastructure to manage
- Automatic updates & scaling
- Enterprise-grade security
Self-Hosted
One-click Terraform deploy
- Production-ready Terraform configs
- Complete data sovereignty
- VPC & on-premise options
- Dedicated support & SLA