Prompt caching · cache-read ROI

Prompt caching ROI calculator

Prompt caching bills your stable prefix — system prompt + tool schemas re-read on every turn — at the cache-read rate (≈0.1x input on Claude). For long multi-turn agents that's a big discount. Toggle caching on and off to see your monthly savings.

Workflow type

Model

frontier tier · 1M context · cache read $0.5/1M

Prompt caching on

Volume

Tasks / month

Steps per task

Model calls

Tool calls

Agent handoffs

Context (tokens)

System prompt

Tool schemas

Retrieved context

Output / call

Tool result tokens

Failure & waste

Retry rate %

Timeout loops / task

Hallucinated tool %

Task failure %

Cost / task

$4.75

Cost / successful task

$5.52

86% success

Monthly spend

$7.1K

1,500 tasks

Monthly waste

$1.8K

25% of spend

Cheaper routing suggestion

Switching from Claude Opus 4.8 to DeepSeek-V4 Pro (reasoner) (DeepSeek, same frontier tier) cuts monthly spend by $6.6K — 92% — for this workload.

Where each task's cost goes

Productive work: 89%Retries / loops / hallucinations: 11%

Input tokens / task: 932K

Output tokens / task: 20K

Model	$/task	$/successful	Monthly	vs current
CHEAPEST OpenAI · fast	$0.0508	$0.0591	$76.18	cheapest
Groq · fast	$0.054	$0.0627	$80.94	1.1×
Gemini · fast	$0.0928	$0.1079	$139	1.8×
OpenRouter · mid	$0.1114	$0.1296	$167	2.2×
DeepSeek · mid	$0.1213	$0.141	$182	2.4×
OpenRouter · mid	$0.2264	$0.2632	$340	4.5×
OpenAI · mid	$0.2539	$0.2953	$381	5.0×
Gemini · mid	$0.3069	$0.3569	$460	6.0×
Together · mid	$0.3622	$0.4211	$543	7.1×
DeepSeek · frontier	$0.3756	$0.4368	$563	7.4×
Groq · mid	$0.6334	$0.7365	$950	12.5×
Anthropic · mid	$0.95	$1.10	$1.4K	18.7×
Together · mid	$1.11	$1.29	$1.7K	21.8×
OpenAI · frontier	$1.27	$1.48	$1.9K	25.0×
Gemini · frontier	$1.27	$1.48	$1.9K	25.0×
OpenAI · frontier	$1.92	$2.24	$2.9K	37.9×
Anthropic · frontier	$2.85	$3.31	$4.3K	56.1×
CURRENT Anthropic · frontier	$4.75	$5.52	$7.1K	93.5×

Per-1M-token list prices; figures are estimates and exclude embeddings, fine-tuning, image/audio, and infrastructure. Rows marked verifystill need a final check against the provider's live pricing. LLM prices change often.

Frequently asked questions

How much does prompt caching save?+

It scales with how large and how stable your prefix is and how many turns re-read it. For a long coding agent with a multi-thousand-token system prompt and tool schemas, cache reads at ~0.1x input can cut total input cost meaningfully. Toggle caching in the calculator to see the exact delta for your sizes.

What gets cached, and what doesn't?+

Only a byte-stable prefix caches — typically your system prompt and tool definitions, rendered in a fixed order. Anything that changes per request (timestamps, IDs, the user's latest message, retrieved context) sits after the cache breakpoint and bills at full price. Keep volatile content last.

Why is my cache hit rate zero?+

Almost always a silent invalidator in the prefix: a current timestamp or UUID in the system prompt, non-deterministic JSON key ordering, or a tool list that varies per request. Any byte change in the prefix invalidates everything after it. Freeze the prefix and the hits return.

Do all providers price caching the same?+

No. Claude cache reads are ~0.1x input; OpenAI and Gemini discount cached input at their own rates; Groq and most open-model hosts have no prompt cache at all (so the toggle does nothing for those models). The comparison table reflects each model's cache-read price.

More agent cost calculators

AI Agent Cost Calculator

The hub — every workflow type and model in one place.

Claude Code cost calculator

Estimate Claude Code spend per task and per month. Model context growth, tool calls, retries and prompt caching across Claude Opus 4.8, Sonnet 4.6 and Haiku 4.5.

Codex cost calculator

Estimate OpenAI Codex and GPT-5 agent costs per task and per month. Model context growth, tool calls, retries and caching across GPT-5, GPT-5 mini and rivals.

LangGraph cost calculator

Estimate LangGraph agent costs. Model graph loops, tool nodes, retries, handoffs and context growth across OpenAI, Anthropic, Gemini, DeepSeek, Groq and more.

CrewAI cost calculator

Estimate CrewAI multi-agent costs. Model crews, task delegation, tool calls, retries and context handoffs across OpenAI, Anthropic, Gemini, DeepSeek and more.

AI agent retry & waste cost calculator

Estimate how much retries, timeout loops, hallucinated tool calls and failed tasks add to your AI agent bill — the hidden waste in cost per successful task.