Prompt caching ROI calculator
Prompt caching bills your stable prefix — system prompt + tool schemas re-read on every turn — at the cache-read rate (≈0.1x input on Claude). For long multi-turn agents that's a big discount. Toggle caching on and off to see your monthly savings.
Switching from Claude Opus 4.8 to DeepSeek-V4 Pro (reasoner) (DeepSeek, same frontier tier) cuts monthly spend by $6.6K — 92% — for this workload.
Where each task's cost goes
| Model | $/task | $/successful | Monthly | vs current |
|---|---|---|---|---|
CHEAPEST OpenAI · fast | $0.0508 | $0.0591 | $76.18 | cheapest |
| Groq · fast | $0.054 | $0.0627 | $80.94 | 1.1× |
| Gemini · fast | $0.0928 | $0.1079 | $139 | 1.8× |
| OpenRouter · mid | $0.1114 | $0.1296 | $167 | 2.2× |
| DeepSeek · mid | $0.1213 | $0.141 | $182 | 2.4× |
| OpenRouter · mid | $0.2264 | $0.2632 | $340 | 4.5× |
| OpenAI · mid | $0.2539 | $0.2953 | $381 | 5.0× |
| Gemini · mid | $0.3069 | $0.3569 | $460 | 6.0× |
| Together · mid | $0.3622 | $0.4211 | $543 | 7.1× |
| DeepSeek · frontier | $0.3756 | $0.4368 | $563 | 7.4× |
| Groq · mid | $0.6334 | $0.7365 | $950 | 12.5× |
| Anthropic · mid | $0.95 | $1.10 | $1.4K | 18.7× |
| Together · mid | $1.11 | $1.29 | $1.7K | 21.8× |
| OpenAI · frontier | $1.27 | $1.48 | $1.9K | 25.0× |
| Gemini · frontier | $1.27 | $1.48 | $1.9K | 25.0× |
| OpenAI · frontier | $1.92 | $2.24 | $2.9K | 37.9× |
| Anthropic · frontier | $2.85 | $3.31 | $4.3K | 56.1× |
CURRENT Anthropic · frontier | $4.75 | $5.52 | $7.1K | 93.5× |
Per-1M-token list prices; figures are estimates and exclude embeddings, fine-tuning, image/audio, and infrastructure. Rows marked verifystill need a final check against the provider's live pricing. LLM prices change often.
Frequently asked questions
How much does prompt caching save?+
It scales with how large and how stable your prefix is and how many turns re-read it. For a long coding agent with a multi-thousand-token system prompt and tool schemas, cache reads at ~0.1x input can cut total input cost meaningfully. Toggle caching in the calculator to see the exact delta for your sizes.
What gets cached, and what doesn't?+
Only a byte-stable prefix caches — typically your system prompt and tool definitions, rendered in a fixed order. Anything that changes per request (timestamps, IDs, the user's latest message, retrieved context) sits after the cache breakpoint and bills at full price. Keep volatile content last.
Why is my cache hit rate zero?+
Almost always a silent invalidator in the prefix: a current timestamp or UUID in the system prompt, non-deterministic JSON key ordering, or a tool list that varies per request. Any byte change in the prefix invalidates everything after it. Freeze the prefix and the hits return.
Do all providers price caching the same?+
No. Claude cache reads are ~0.1x input; OpenAI and Gemini discount cached input at their own rates; Groq and most open-model hosts have no prompt cache at all (so the toggle does nothing for those models). The comparison table reflects each model's cache-read price.