Reduces the number of tokens in your requests by compressing JSON payloads. Fewer tokens means lower cost.
TOON stands for Token Optimization & Output Normalization. It turns verbose JSON into a compact binary format before it reaches the LLM.
Your app sends a request with a normal JSON payload โ nothing changes on your side
The gateway converts your JSON into a compact binary format, stripping out redundant structure
The compressed payload uses fewer tokens when sent to the LLM. You pay less for the same work
Same data, fewer tokens. Here's what a typical JSON payload looks like before and after TOON compression.
Verbose, lots of structural tokens
{
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Summarize this document..."
}
],
"temperature": 0.7,
"max_tokens": 1024
}Compact binary, same meaning
<TOON:compressed> sys:"You are a helpful assistant." usr:"Summarize this document..." t:0.7 max:1024 </TOON>
At scale, even small reductions save real money. A 20-30% cut across millions of requests adds up fast.
Turn it on for projects where it helps. Leave it off where you need raw JSON. You decide, per project.
Your app doesn't change. Compression happens at the gateway level. Your code sends normal JSON, same as always.
See compression ratio and token savings per project in your dashboard. No guessing โ real numbers.
~20-30%
Token Reduction
On typical JSON payloads
Per-Project
Toggle
Enable where it helps
Zero
Code Changes
Works at the gateway level
Built-In
Dashboard Metrics
Compression ratio & savings
Toggle it on for any project. Your app sends the same requests โ the gateway handles the rest.