API Format
Both formats support the same caching functionality. The
prompt_caching helper in the OpenAI format automatically converts to cache_control markers internally. Examples below are organized by format for clarity.Pricing
Prompt caching uses a tiered pricing structure based on cache duration and usage type:| Model | Base Input Tokens | 5m Cache Writes | 1h Cache Writes | Cache Hits & Refreshes | Output Tokens |
|---|---|---|---|---|---|
| Claude Opus 4.5 | $5 / MTok | $6.25 / MTok | $10 / MTok | $0.50 / MTok | $25 / MTok |
| Claude Opus 4.1 | $15 / MTok | $18.75 / MTok | $30 / MTok | $1.50 / MTok | $75 / MTok |
| Claude Opus 4 | $15 / MTok | $18.75 / MTok | $30 / MTok | $1.50 / MTok | $75 / MTok |
| Claude Sonnet 4.5 | $3 / MTok | $3.75 / MTok | $6 / MTok | $0.30 / MTok | $15 / MTok |
| Claude Sonnet 4 | $3 / MTok | $3.75 / MTok | $6 / MTok | $0.30 / MTok | $15 / MTok |
| Claude Sonnet 3.7 | $3 / MTok | $3.75 / MTok | $6 / MTok | $0.30 / MTok | $15 / MTok |
| Claude Haiku 4.5 | $1 / MTok | $1.25 / MTok | $2 / MTok | $0.10 / MTok | $5 / MTok |
| Claude Haiku 3.5 | $0.80 / MTok | $1 / MTok | $1.6 / MTok | $0.08 / MTok | $4 / MTok |
| Claude Haiku 3 | $0.25 / MTok | $0.30 / MTok | $0.50 / MTok | $0.03 / MTok | $1.25 / MTok |
- 5-minute cache writes: 1.25x the base input tokens price
- 1-hour cache writes: 2x the base input tokens price
- Cache reads: 0.1x the base input tokens price
Cache writes occur when content is first cached. Subsequent requests using the cached content are charged at the cache read rate (10% of base input price). The cache lifetime is refreshed each time the cached content is used.
Limitations
- Maximum 4 cache breakpoints per request
- Caches expire after 5 minutes by default
cache_controlcan only be inserted into text content blocks
Examples
System Message Caching
Anthropic Format (/v1/messages)
OpenAI Format (/v1/chat/completions)
User Message Caching
Anthropic Format (/v1/messages)
OpenAI Format (/v1/chat/completions)
Cache Control Options
Anthropic Format (/v1/messages)
Cache type. Currently only
"ephemeral" is supported.Time-to-live. Optional, defaults to 5 minutes. Format:
"5m" or "1h".Some models like
claude-3-7-sonnet-20250219 do not support TTL in system messages. TTL is automatically stripped for these models.OpenAI Format (/v1/chat/completions)
Enable prompt caching. Set to
true to enable caching.Time-to-live. Optional, defaults to
"5m". Format: "5m" or "1h".Zero-based index of the last message to cache. All messages up to and including this index will be cached.
The
prompt_caching helper automatically converts to cache_control markers in message content. You can also use cache_control markers directly in message content blocks (same as Anthropic format).Using TTL
Specify cache duration with thettl parameter:
Anthropic Format (/v1/messages)
OpenAI Format (/v1/chat/completions)
Usage Tracking
The API response includes cache usage in theusage object. Both endpoints return cache usage information:
Anthropic Format (/v1/messages)
OpenAI Format (/v1/chat/completions)
cache_read_input_tokens will be non-zero instead of cache_creation_input_tokens.
The OpenAI format uses
prompt_tokens and completion_tokens instead of input_tokens and output_tokens, but the cache-related fields are identical across both formats.Best Practices
- Cache large, static content that doesn’t change frequently
- Monitor
cache_read_input_tokensto verify cache hits - Respect the 4 breakpoint limit - prioritize largest, most reused content
- Remember cache expiration - caches expire after 5 minutes (or your TTL)
