Skip to main content
Production-Ready — BYOK. Zero markup. Full control.

Every LLM call, optimally routed.

Intelligent routing across 114+ models from 8 providers. Zero markup. Built-in PII detection. Cut your AI spend by up to 97%.

14-day free trial for paid plans. No credit card required. Cancel anytime.

Intelligent Routing
main.py
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "Analyze quarterly revenue..."}],
    optimize="cost",
)

Analysis

Complexity0.00
Task
Financial Analysis
Optimize
Cost

Candidates

Google AI logo
Gemini 2.5 Pro
—
GPT-4o
—
Claude Sonnet 4
—

Request Log

TimeModelProviderTokensCost (¢)LatencyThroughputStatus

2s ago

3:42:18 PM

gemini-2.5-pro
Google AI logo
Google AI
1,847
1,218 in / 629 out
0.7813
1,247ms
1,481tok/s
OK
Claude Sonnet 4: $0.013→Gemini 2.5 Pro: $0.008
40% saved
Request Detailsreq_7f3a...k2m1

Good

Request Performance

1247ms

1481 tok/s

SlowModerateGoodFast
Cost IntelligenceCostbase.ai
Total cost
0.0078130.7813¢

Cost Breakdown

Input (1,218 tokens)$0.001523
Output (629 tokens)$0.006290

Per 1K tokens

$0.00423

Tokens per cent

2,364

Cost at Scale

1K reqs

$7.81

10K reqs

$78.13

100K reqs

$781.30

Provider
Google AI logo
Google AI
Modelgemini-2.5-pro
CacheMiss
FailoverFirst attempt

Up to 97%

Cost Reduction

<20ms

Gateway Overhead (p50)

8

AI Providers Supported

114+

Models With Full Scoring

USE YOUR EXISTING API KEYS FROM ALL MAJOR PROVIDERS

OpenAI
Anthropic
Google AI logo
Google
Groq logo
Groq
xAI
DeepSeek logo
DeepSeek
Mistral AI logo
Mistral
Cohere logo
Cohere

Everything you need to ship AI at scale

Drop-in replacement for OpenAI SDK. Change the base URL and get intelligent routing, caching, compliance, and cost visibility—instantly.

Intelligent Routing

Multi-factor scoring across 80+ models. Heuristic, embedding-based, and use-case analysis picks the optimal model automatically.

Learn more

Semantic Caching

ONNX local embeddings match similar queries to cached responses. Cache hits cost $0 with instant response times.

Learn more

Cost Tracking & Attribution

Per-request cost calculation with custom tracking IDs. Attribute LLM costs to tenants, users, or projects for SaaS billing.

Learn more

PII/PHI Protection

Detect and mask emails, SSNs, credit cards, medical records, and custom patterns before they reach AI models.

Learn more

Budget Controls

Set spending limits per organization, provider, or project. Hard limits block requests. Soft limits trigger alerts.

Learn more

Guardrails Audit Log

Full audit trail of all PII/PHI detections with HMAC-hashed matched text. Meet compliance requirements.

Learn more

Automatic Failover

Circuit breaker pattern with health metrics. Retries across providers on transient errors in under 100ms.

Learn more

SLO Monitoring

Track uptime, P50/P95/P99 latency, and error rates against configurable targets. Get violation alerts.

Learn more

TOON Compression

JSON-to-binary token optimization reduces payload size. Per-project toggle with compression metrics.

Learn more

BYOK (Zero Markup)

Bring your own API keys with AES-256-GCM encryption. Pay providers directly. Your rate limits and enterprise discounts stay intact.

Learn more

Webhooks & Alerts

Send real-time alerts to Slack, Discord, or custom HTTP endpoints. Filter by event type and severity.

Learn more

Team & RBAC

Multi-organization support with Owner, Admin, Member, and Viewer roles. Invite by email or shareable link.

Learn more

Get started in 2 minutes

No SDK changes. No code refactoring. Just point your existing OpenAI client to Costbase.

Create API Key

Sign up and generate your Costbase API key from the dashboard.

Change Base URL

Update your OpenAI SDK base URL to point to Costbase gateway.

Watch Savings Grow

Intelligence kicks in immediately. Track savings in real-time.

your-app.ts
import OpenAI from 'openai';

// Just change the baseURL - intelligence takes over
const client = new OpenAI({
  apiKey: 'YOUR_COSTBASE_API_KEY',
  baseURL: 'https://api.costbase.ai/v1',
});

// Set model: "auto" for intelligent routing
const response = await client.chat.completions.create({
  model: 'auto', // We pick the optimal model
  messages: [{ role: 'user', content: 'Explain quantum computing' }],
  optimize: 'cost', // Cost, quality, or speed
});

// Response includes cost transparency
console.log(response.costbase);
// → { model_used: "claude-3-5-haiku", saved_vs_baseline: "$0.003" }

Calculate Your Savings

See how much you could save with intelligent routing and semantic caching

$
/month
$0$100K

Routing Optimization

40% average savings

-$2,000/month

Semantic Caching

20% average savings

-$1,000/month

Estimated Monthly Savings$3,000

Annual savings: $36,000

That's 60% reduction in your LLM costs

Stop Overpaying for LLM APIs

Most teams overpay because they use expensive models for every query, can't cache effectively, and have no visibility into spend.

Costbase fixes all of this. Choose your deployment.

Managed Cloud

Fastest way to start

  • 14-day free trial on any plan
  • No infrastructure to manage
  • Automatic updates & scaling
  • Enterprise-grade security
Start Free Trial

Self-Hosted

One-click Terraform deploy

AWS•GCP•Azure
  • Production-ready Terraform configs
  • Complete data sovereignty
  • VPC & on-premise options
  • Dedicated support & SLA
Contact Sales

Frequently Asked Questions