Prompt Caching

Cache long system prompts and reusable context across requests. Cut costs and latency on conversational, RAG, and agent workloads.

Edit on GitHub·View as Markdown

Prompt Caching

AnyRouter supports prompt caching on every upstream that exposes it (Anthropic, OpenAI, DeepSeek, Gemini). Cache a long system prompt once and reuse it across thousands of requests — cached tokens are billed at a fraction of the normal input price.

When to use it

Prompt caching pays off any time the same prefix is sent many times. Concrete wins:

System prompts — a multi-thousand-token persona or tool spec repeated on every turn.
RAG — retrieved documents that are the same for a batch of questions.
Agent scaffolds — long tool catalogs shared across every step of a chain.
Few-shot examples — static demonstrations pinned to the front of the prompt.

Tip

Rule of thumb: if a chunk is repeated across two or more requests within five minutes, cache it. The break-even vs. uncached input is typically 1.5–2 uses.

Anthropic caching

Anthropic supports breakpoint-style caching. Mark blocks with cache_control:

TYPESCRIPT

typescript

const response = await client.messages.create({
  model: "anthropic/claude-sonnet-4.6",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: LONG_SYSTEM_PROMPT,
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [{ role: "user", content: "What's the policy on refunds?" }],
})

Cache reads are billed at 10% of input price; cache writes are billed at 125%. The cache TTL is 5 minutes.

OpenAI automatic caching

OpenAI caches prompts ≥1024 tokens automatically — no request changes needed. The first request warms the cache; subsequent requests within the cache window pay reduced input pricing on cached tokens.

Pricing

Every model's detail page shows both standard and cache-read pricing. Cached tokens are reflected in the usage field of the response:

JSON

json

{
  "usage": {
    "prompt_tokens": 5840,
    "completion_tokens": 120,
    "total_tokens": 5960,
    "prompt_tokens_details": {
      "cached_tokens": 5800
    }
  }
}

AnyRouter bills cached tokens at the provider's cache-read rate, pass-through. No extra AnyRouter markup on cache reads.