Prompt Caching

To save on inference costs, you can leverage prompt caching on supported providers and models. When a provider supports it, LangDB will make a best-effort to route subsequent requests to the same provider to make use of the warm cache.

Most providers automatically enable prompt caching for large prompts, but some, like Anthropic, require you to enable it on a per-message basis.

How Caching Works

Automatic Caching

Providers like OpenAI, Grok, DeepSeek, and (soon) Google Gemini enable caching by default once your prompt exceeds a certain length (e.g. 1024 tokens).

Activation: No change needed. Any prompt over the length threshold is written to cache.
Best Practice: Put your static content (system prompts, RAG context, long instructions) first in the message so it can be reused.
Pricing:
- Cache Write: Mostly free or heavily discounted.
- Cache Read: Deep discounts vs. fresh inference.

Manual Caching:

Anthropic’s Claude family requires you to mark which parts of the message are cacheable by adding a cache_control object. You can also set a TTL to control how long the block stays in cache.

Activation: You must wrap static blocks in a content array and give them a cache_control entry.
TTL: Use {"ttl": "5m"} or {"ttl": "1h"} to control expiration (default 5 minutes).
Best For: Huge documents, long backstories, or repeated system instructions.
Pricing:
- Cache Write: 1.25× the normal per-token rate
- Cache Read: 0.1× (10%) of the normal per-token rate
Limitations: Ephemeral (expires after TTL), limited number of blocks.

In this run you’ll see “Prompt Caching: 99.9% Write,” a small cost increase (~25%).

Caching Example ( Anthropic)

Here is an example of caching a large document. This can be done in either the system or user message.

{
  "model": "anthropic/claude-3.5-sonnet",
  "messages": [
    {
      "role": "system",
      "content": [
        {
          "type": "text",
          "text": "You are a helpful assistant that analyzes legal documents. The following is a terms of service document:"
        },
        {
          "type": "text",
          "text": "HUGE DOCUMENT TEXT...",
          "cache_control": {
            "type": "ephemeral",
            "ttl": "1h"
          }
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Summarize the key points about data privacy."
        }
      ]
    }
  ]
}

Provider Support Matrix

Provider	Auto-cache?	Manual flag?	TTL	Write cost	Read cost
OpenAI	:white_check_mark:	❌	N/A	standard	0.25x or 0.5x
Grok	:white_check_mark:	❌	N/A	standard	0.25x
DeepSeek	:white_check_mark:	❌	N/A	standard	0.25x
Anthropic Claude	❌	`cache_control` + TTL	5 m / 1 h	1.25×	0.1×

For the most up-to-date information on a specific model or provider's caching policy, pricing, and limitations, please refer to the model page on LangDB

How Caching Works​

Automatic Caching​

Manual Caching:​

Caching Example ( Anthropic)​

Provider Support Matrix​

How Caching Works

Automatic Caching

Manual Caching:

Caching Example ( Anthropic)

Provider Support Matrix