Pricing
Prompt caching uses a tiered pricing structure based on cache duration and usage type:| Model | Base Input Tokens | 5m Cache Writes | 1h Cache Writes | Cache Hits & Refreshes | Output Tokens |
|---|---|---|---|---|---|
| Claude Opus 4.5 | $5 / MTok | $6.25 / MTok | $10 / MTok | $0.50 / MTok | $25 / MTok |
| Claude Opus 4.1 | $15 / MTok | $18.75 / MTok | $30 / MTok | $1.50 / MTok | $75 / MTok |
| Claude Opus 4 | $15 / MTok | $18.75 / MTok | $30 / MTok | $1.50 / MTok | $75 / MTok |
| Claude Sonnet 4.5 | $3 / MTok | $3.75 / MTok | $6 / MTok | $0.30 / MTok | $15 / MTok |
| Claude Sonnet 4 | $3 / MTok | $3.75 / MTok | $6 / MTok | $0.30 / MTok | $15 / MTok |
| Claude Sonnet 3.7 | $3 / MTok | $3.75 / MTok | $6 / MTok | $0.30 / MTok | $15 / MTok |
| Claude Haiku 4.5 | $1 / MTok | $1.25 / MTok | $2 / MTok | $0.10 / MTok | $5 / MTok |
| Claude Haiku 3.5 | $0.80 / MTok | $1 / MTok | $1.6 / MTok | $0.08 / MTok | $4 / MTok |
| Claude Opus 3 | $15 / MTok | $18.75 / MTok | $30 / MTok | $1.50 / MTok | $75 / MTok |
| Claude Haiku 3 | $0.25 / MTok | $0.30 / MTok | $0.50 / MTok | $0.03 / MTok | $1.25 / MTok |
- 5-minute cache writes: 1.25x the base input tokens price
- 1-hour cache writes: 2x the base input tokens price
- Cache reads: 0.1x the base input tokens price
Cache writes occur when content is first cached. Subsequent requests using the cached content are charged at the cache read rate (10% of base input price). The cache lifetime is refreshed each time the cached content is used.
Limitations
- Maximum 4 cache breakpoints per request
- Caches expire after 5 minutes by default
cache_controlcan only be inserted into text content blocks
Examples
System Message Caching
User Message Caching
Cache Control Options
Cache type. Currently only
"ephemeral" is supported.Time-to-live. Optional, defaults to 5 minutes. Format:
"5m" or "1h".Some models like
claude-3-7-sonnet-20250219 do not support TTL in system messages. TTL is automatically stripped for these models.Using TTL
Specify cache duration with thettl parameter:
Usage Tracking
The API response includes cache usage in theusage object:
cache_read_input_tokens will be non-zero instead of cache_creation_input_tokens.
Best Practices
- Cache large, static content that doesn’t change frequently
- Monitor
cache_read_input_tokensto verify cache hits - Respect the 4 breakpoint limit - prioritize largest, most reused content
- Remember cache expiration - caches expire after 5 minutes (or your TTL)
