Simple per-request pricing. Use YOUR API keys from OpenAI, Anthropic, Google, and more. We add intelligent routing and semantic caching - you keep full control and savings.
14-day free trial. Plans from $29/mo for 50K requests. Cancel anytime.
30-80%
Typical Cost Reduction
<25ms
Gateway Overhead (p50)
8
AI Providers Supported
95+
Models With Full Scoring
Real-world latency from k6 load tests (20 concurrent users, 9-minute duration)
Routing + caching + analytics overhead
Production-grade SLO monitoring
* Benchmarked with exact timing instrumentation. Latency varies by provider and model selection.
USE YOUR EXISTING API KEYS FROM ALL MAJOR PROVIDERS
Unlike API resellers who add markup fees, Costbase uses YOUR API keys directly. We add intelligence - you keep control.
How it works:
Configure your provider API keys in Costbase
Point your OpenAI SDK to Costbase gateway
We route intelligently using YOUR keys
Built for production. Optimized for savings.
Auto-failover when providers go down. Zero dropped requests.
Real-time provider health. Smart recovery patterns.
Latest OpenAI reasoning models with proper capability scoring.
Model costs auto-update. No manual config needed.
Every request analyzed. Every dollar optimized. Here's the proof.
Your App
OpenAI SDK
Processing in <25ms
Guardrails
PII/PHI check
Budget
Limit check
Cache
Semantic lookup
Smart Route
Best model
Load Balance
Distribute
Failover
<100ms
AI Providers
Your keys
Response
Optimized
30-80%
Cost Saved
<25ms
Overhead
99.95%
Uptime
8
Providers
Simple Query
"What's the capital of France?"
94% saved
Same quality, fraction of the cost
Complex Analysis
"Analyze Q3 financial trends..."
Savings compound
Cache hit rate improves over time
Right model. Right cost. Every time.
Analyzes query complexity and routes to the cheapest model that meets quality threshold
Uses embeddings for task-aware routing—coding to GPT-4o, creative to Claude, translation to Gemini
Optimizes for latency and quality based on real-time provider benchmarks
Drop-in replacement for OpenAI SDK. Just change the base URL and let intelligence take over.
Three routing strategies (cost, semantic, performance) analyze every request and pick the optimal model automatically.
AI-powered caching understands query intent. "What is ML?" and "Explain machine learning" return cached results.
Show exactly how much you saved vs direct API usage. Breakdown by cache savings vs routing savings.
Seamless fallback between providers. Never miss a request due to rate limits or outages.
Set limits per API key. Get alerts before overages. Prove ROI to finance with predictive forecasting.
Continuous benchmarking on YOUR data. Personalized routing based on your actual usage patterns.
Automatically detect and mask sensitive data in real-time. HIPAA compliance made simple. No competitors offer this.
Social Security Numbers
Detect & mask SSN patterns
Credit Card Numbers
PCI-DSS compliant masking
Medical Record Numbers
HIPAA PHI protection
Email & Phone Numbers
PII redaction
Custom Patterns
Enterprise-only
// User prompt
"Process payment for patient John Smith,
SSN: 123-45-6789
, card
4532-1234-5678-9010
"
// Sent to LLM (masked)
"Process payment for patient John Smith, SSN: ***-**-****, card ****-****-****-****"
2
Patterns
<5ms
Scan Time
Masked
Action
Available on
Growth+ Tier
HIPAA-compliant PHI detection
PCI-DSS credit card masking
Client data protection
FedRAMP-ready compliance
Built-in redundancy, automatic failover, and real-time monitoring keep your AI services running—even when providers go down.
If OpenAI goes down, we route to Anthropic or Google instantly. Configure up to 3 provider fallbacks per request.
Track uptime, latency (p50/p95/p99), and error rates. Get instant alerts when SLOs are violated.
Automatically stops routing to failing providers and retries with exponential backoff. No manual intervention needed.
OpenAI degradation detected at 2:47 PM. Routed to Anthropic in 94ms. Zero customer impact.
Real-time monitoring, alerting, and governance for mission-critical AI workloads
Track uptime, latency (p50/p95/p99), and error rates with real-time SLO violation detection.
Default: 99.9% uptime, p95 < 2s, errors < 0.1%
Set daily/monthly budgets per provider with predictive spend forecasting and overage alerts.
Per-provider and organization-wide budgets
Send real-time alerts to Slack, Discord, or custom endpoints for budget, rate limits, and SLO violations.
HMAC signatures for security validation
Per-key request limits with sliding window enforcement
Role-based access control for organizations
Complete audit trail of all gateway requests
Cost breakdown by provider, model, and API key
Production-grade analytics dashboard with real-time monitoring and detailed breakdowns
/portal/usage
Total Requests
127.4K
Cache Hit Rate
23.8%
✓ Cost breakdown by provider, model, and API key
/portal/activity
2.1K tokens • $0.052
1.8K tokens • $0.054
843 tokens • $0.001
✓ Complete audit trail with request/response inspection
/portal/budgets
Daily Burn Rate
$41.57
Projected Spend
$1,746
✓ Predictive forecasting with alerts at 50%, 80%, 100%
/portal/slo
Uptime
99.97%
Target: 99.9%
✓ MEETS SLA
P95 Latency
487ms
Target: <2000ms
Error Rate
0.03%
Target: <0.1%
✓ Real-time violation alerts with historical trends
Plus: Rate Limiting, Team Management, Webhooks, API Keys, Provider Configuration, and more
Start FreeNo SDK changes. No code refactoring. Just point your existing OpenAI client to Costbase.
Sign up and generate your Costbase API key from the dashboard.
Update your OpenAI SDK base URL to point to Costbase gateway.
Intelligence kicks in immediately. Track savings in real-time.
import OpenAI from 'openai';
// Just change the baseURL - intelligence takes over
const client = new OpenAI({
apiKey: 'YOUR_COSTBASE_API_KEY',
baseURL: 'https://api.costbase.ai/v1',
});
// Use exactly as before - we optimize automatically
const response = await client.chat.completions.create({
model: 'gpt-4o', // We route to optimal model
messages: [{ role: 'user', content: 'Hello!' }],
});See how much you could save with intelligent routing and semantic caching
Routing Optimization
40% average savings
-$2,000/month
Semantic Caching
20% average savings
-$1,000/month
Recommended Plan
Enterprise
Unlimited requests/month
$499
/month
501x ROI
Save $30,012 per year
That's 60% reduction in your LLM costs
Most teams overpay 30-80% because they use expensive models for every query, can't cache effectively, and have no visibility into spend.
Costbase fixes all of this. Choose your deployment.
Fastest way to start
One-click Terraform deploy
Starter: 50K req/mo for $29 • Pro: 150K req/mo for $49 • Growth: 1M req/mo for $249