Context caching

What is context caching?

Context caching stores previously sent prompt tokens on the provider side so they do not need to be reprocessed on subsequent calls. When the same system prompt, few-shot examples, or document context appears in multiple requests, the cached tokens are read from memory at a reduced cost instead of being processed from scratch.

Anthropic charges 90% less for cached input tokens. OpenAI offers a similar mechanism. The savings compound when your application sends the same context prefix thousands of times per day.

Why it matters

If your application uses a long system prompt or large document as context, and you make repeated calls, context caching can reduce input costs by 50–90%. The tradeoff is a small additional cost for the initial cache write. For applications with stable, repeated context — the typical pattern — this is a net saving from the first hour.

What is context caching?

Related terms