Claude’s Prompt Cache: You’re Still Sending All the Tokens

by marjavamitjava · May 8, 2026

If you’ve looked at your Claude API usage and wondered why input tokens seem

suspiciously low — you’re not imagining things. The label is misleading.

Here’s what’s actually happening.

—

The Three Token Buckets

Every API call to Claude splits your tokens into three categories:

┌─────────────┬────────────────────────────────────────┬────────────────┐

│ Category │ Meaning │ Price (approx) │

├─────────────┼────────────────────────────────────────┼────────────────┤

│ input │ Tokens NOT in cache │ ~$3/M │

│ cache_write │ Tokens being cached for the first time │ ~$3.75/M │

│ cache_read │ Tokens served from cache │ ~$0.30/M │

└─────────────┴────────────────────────────────────────┴────────────────┘

Your total data sent = all three combined. The input label alone doesn’t tell

you how many tokens you actually sent.

—

How the Cache Works

Every turn, your client sends the full context to Anthropic’s servers —

conversation history, system prompt, file contents, everything. Anthropic’s

server then checks:

▎ “Have I seen this exact prefix before, recently (~5 min TTL)?”

– Cache hit → tokens billed as cache_read (10x cheaper)

– Cache miss → tokens billed as input or cache_write

The saving is not in bandwidth. All the data travels over the wire every time.

The saving is in Anthropic’s GPU compute — they don’t re-process tokens

they’ve already processed recently, and pass that saving on.

—

A Real Example

Here’s actual usage from a Claude Code session that did a deep codebase search

across Java files:

claude-haiku-4-5:

input: 583 tokens

cache_write: 90.7k tokens

cache_read: 1.0M tokens

cost: $0.24

Real tokens sent: ~1.09M. Not 583.

If those 1.09M tokens had all been regular input at $0.80/M (Haiku input

rate), the cost would’ve been ~$0.87. With caching: $0.24. ~3.5x cheaper.

—

What Breaks the Cache

The cache is keyed on exact prefix match. Anything that changes the prefix =

cache miss:

– You edit a file that’s in the context → miss

– New conversation (TTL expired) → miss, everything becomes cache_write again

– Different system prompt → miss

Cache misses aren’t catastrophic — they just cost regular input rates for that

turn, and re-prime the cache for subsequent turns.

—

Exactly the right observation. Let me think through this honestly.

Your logic is correct

First time a file is read:

– Content is NEW → should be input tokens OR cache_write tokens

– NOT cache_read (nothing to read from yet)

So first reads should show up as either input or cache_write, not cache_read.

The answer: cache_write IS the first read

When content is new AND gets cached, Anthropic bills it as cache_write, NOT as

input.

First time seen + marked for caching → cache_write

First time seen + NOT marked → input

Seen before + in cache → cache_read

So your 90.7k cache_write = the file contents being tokenized and cached for

the first time.

The 583 input = tiny uncached content (like the agent prompt itself, a few

instructions).

So the full picture

583 input → small new content NOT cached

90.7k cache_write → files read for first time, cached

1.0M cache_read → system prompt + conversation history resent every turn

—

The Practical Takeaway

When reading your usage stats, ignore input in isolation. The number that

tells you actual data volume is:

total_tokens = input + cache_write + cache_read

And if cache_read dominates — that’s a good thing. It means your context is

stable across turns and you’re getting the bulk discount on repeated content.

Cache doesn’t reduce what you send. It reduces what you pay per token for

sending it.

—

Claude’s Prompt Cache: You’re Still Sending All the Tokens

You may also like...

Leave a Reply Cancel reply

Claude’s Prompt Cache: You’re Still Sending All the Tokens

You may also like...

Unleashing Efficiency: Mastering Batch Processing in Java (and Beyond!)

Database Design for Cricket Scoring App with Efficient Ball Information and Flexible Extra Handling

Understanding MCP Client-Server Communication in ColdFusion

Leave a Reply Cancel reply