CORE FEATURE

Semantic
Caching

AI-powered caching that understands meaning, not just strings. Save 15-30% on LLM costs with instant ~5ms responses.

Enable Caching See How It Works

~5ms

Response Time

vs 500-2000ms from LLMs

15-30%

Cost Reduction

From cache hits alone

85%

Similarity Threshold

Configurable per project

$0.00

Cache Hit Cost

Zero LLM API charges

Caching That Understands Meaning

Unlike simple string matching, semantic caching understands that different phrasings can mean the same thing

Traditional Caching

Exact string match only

"What is machine learning?"

✓ Cache hit

"Explain ML to me"

✗ Cache miss — calls LLM

"What's ML?"

✗ Cache miss — calls LLM

3 LLM calls, 1 cache hit

Semantic Caching

Understands meaning

"What is machine learning?"

✓ Cache hit (stored)

"Explain ML to me"

✓ Cache hit (92% similar)

"What's ML?"

✓ Cache hit (89% similar)

1 LLM call, 3 cache hits = 66% saved

How It Works

Embed Query

Convert incoming prompt to a high-dimensional vector using embedding models

Similarity Search

Find cached responses with similarity above your threshold (default 85%)

Instant Response

Return cached response in ~5ms, or call LLM and cache for future queries

Why Semantic Caching?

Instant Cost Savings

Cache hits cost $0. No tokens, no API calls, no charges. Savings compound as your cache grows.

Lightning Fast

~5ms response time vs 500-2000ms from LLM providers. Your users notice the difference.

Automatic & Smart

No configuration needed. Caching is enabled by default and learns from your traffic patterns.

Privacy First

Cache is isolated per organization. Your prompts and responses never leak to other customers.

Configurable TTL

Control cache expiration. Short TTL for dynamic content, long TTL for stable responses.

Full Visibility

See cache hit rates, savings, and performance metrics in your dashboard.

Savings Compound Over Time

Your cache hit rate improves as more queries flow through Costbase. Most customers see 20%+ cache hit rates within the first month.

Week 1

5-10%

Week 2-4

15-20%

Month 2+

25-35%

Example: 100K requests/month

Monthly LLM spend$2,000

Cache hit rate (25%)25,000 free responses

LLM cost savings$500/month

Annual savings$6,000

Start Caching Intelligently

Semantic caching is enabled by default on all plans. Start saving from your first request.

Start Free Trial View Pricing

SemanticCaching