# Prompt Caching

> Cache long system prompts and reusable context across requests. Cut costs and latency on conversational, RAG, and agent workloads.


# Prompt Caching

AnyRouter supports **prompt caching** on every upstream that exposes it (Anthropic, OpenAI, DeepSeek, Gemini). Cache a long system prompt once and reuse it across thousands of requests — cached tokens are billed at a fraction of the normal input price.

## When to use it

Prompt caching pays off any time the same prefix is sent many times. Concrete wins:

- **System prompts** — a multi-thousand-token persona or tool spec repeated on every turn.
- **RAG** — retrieved documents that are the same for a batch of questions.
- **Agent scaffolds** — long tool catalogs shared across every step of a chain.
- **Few-shot examples** — static demonstrations pinned to the front of the prompt.

:::tip
Rule of thumb: if a chunk is repeated across **two or more requests within five minutes**, cache it. The break-even vs. uncached input is typically 1.5–2 uses.
:::

## Anthropic caching

Anthropic supports breakpoint-style caching. Mark blocks with `cache_control`:

```typescript
const response = await client.messages.create({
  model: "anthropic/claude-sonnet-4.6",
  max_tokens: 1024,
  system: [
    {
      type: "text",
      text: LONG_SYSTEM_PROMPT,
      cache_control: { type: "ephemeral" },
    },
  ],
  messages: [{ role: "user", content: "What's the policy on refunds?" }],
})
```

Cache reads are billed at 10% of input price; cache writes are billed at 125%. The cache TTL is 5 minutes.

## OpenAI automatic caching

OpenAI caches prompts ≥1024 tokens automatically — no request changes needed. The first request warms the cache; subsequent requests within the cache window pay reduced input pricing on cached tokens.

## Pricing

Every model's [detail page](/models) shows both standard and cache-read pricing. Cached tokens are reflected in the `usage` field of the response:

```json
{
  "usage": {
    "prompt_tokens": 5840,
    "completion_tokens": 120,
    "total_tokens": 5960,
    "prompt_tokens_details": {
      "cached_tokens": 5800
    }
  }
}
```

AnyRouter bills cached tokens at the provider's cache-read rate, pass-through. No extra AnyRouter markup on cache reads.
