Agent Cost Compare
Prompt caching · cache-read ROI

Prompt caching ROI calculator

Prompt caching bills your stable prefix — system prompt + tool schemas re-read on every turn — at the cache-read rate (≈0.1x input on Claude). For long multi-turn agents that's a big discount. Toggle caching on and off to see your monthly savings.

Cost / task
$4.75
Cost / successful task
$5.52
86% success
Monthly spend
$7.1K
1,500 tasks
Monthly waste
$1.8K
25% of spend
Cheaper routing suggestion

Switching from Claude Opus 4.8 to DeepSeek-V4 Pro (reasoner) (DeepSeek, same frontier tier) cuts monthly spend by $6.6K 92% — for this workload.

Where each task's cost goes

Productive work: 89%Retries / loops / hallucinations: 11%
Input tokens / task: 932K
Output tokens / task: 20K
Model$/task$/successfulMonthlyvs current
CHEAPEST
OpenAI · fast
$0.0508$0.0591$76.18
cheapest
Groq · fast
$0.054$0.0627$80.94
1.1×
Gemini · fast
$0.0928$0.1079$139
1.8×
OpenRouter · mid
$0.1114$0.1296$167
2.2×
DeepSeek · mid
$0.1213$0.141$182
2.4×
OpenRouter · mid
$0.2264$0.2632$340
4.5×
OpenAI · mid
$0.2539$0.2953$381
5.0×
Gemini · mid
$0.3069$0.3569$460
6.0×
Together · mid
$0.3622$0.4211$543
7.1×
DeepSeek · frontier
$0.3756$0.4368$563
7.4×
Groq · mid
$0.6334$0.7365$950
12.5×
Anthropic · mid
$0.95$1.10$1.4K
18.7×
Together · mid
$1.11$1.29$1.7K
21.8×
OpenAI · frontier
$1.27$1.48$1.9K
25.0×
Gemini · frontier
$1.27$1.48$1.9K
25.0×
OpenAI · frontier
$1.92$2.24$2.9K
37.9×
Anthropic · frontier
$2.85$3.31$4.3K
56.1×
CURRENT
Anthropic · frontier
$4.75$5.52$7.1K
93.5×

Per-1M-token list prices; figures are estimates and exclude embeddings, fine-tuning, image/audio, and infrastructure. Rows marked verifystill need a final check against the provider's live pricing. LLM prices change often.

Frequently asked questions

How much does prompt caching save?+

It scales with how large and how stable your prefix is and how many turns re-read it. For a long coding agent with a multi-thousand-token system prompt and tool schemas, cache reads at ~0.1x input can cut total input cost meaningfully. Toggle caching in the calculator to see the exact delta for your sizes.

What gets cached, and what doesn't?+

Only a byte-stable prefix caches — typically your system prompt and tool definitions, rendered in a fixed order. Anything that changes per request (timestamps, IDs, the user's latest message, retrieved context) sits after the cache breakpoint and bills at full price. Keep volatile content last.

Why is my cache hit rate zero?+

Almost always a silent invalidator in the prefix: a current timestamp or UUID in the system prompt, non-deterministic JSON key ordering, or a tool list that varies per request. Any byte change in the prefix invalidates everything after it. Freeze the prefix and the hits return.

Do all providers price caching the same?+

No. Claude cache reads are ~0.1x input; OpenAI and Gemini discount cached input at their own rates; Groq and most open-model hosts have no prompt cache at all (so the toggle does nothing for those models). The comparison table reflects each model's cache-read price.

More agent cost calculators