Many LLM workflows ask the same question twice. Eval suites re-run cases. Users repeat queries. Sub-agents recompute identical sub-prompts. Caching: don't pay for the same call twice.
import hashlib
cache = {}
def cached_call(prompt):
key = hashlib.sha256(prompt.encode()).hexdigest()
if key in cache:
print("cache hit")
return cache[key]
result = Agent(model).run_sync(prompt).output
cache[key] = result
print("cache miss")
return resultWhy hash the prompt instead of using it as the key directly?
A 2KB prompt as a dict key works in Python, but hash gives you a fixed-size string that survives serialisation (Redis, Sheet, JSON). Same lookup, smaller storage. For an in-memory cache only, a plain cache[prompt] = result is fine too.
Same prompt always gives same answer?
With temperature > 0, the model is stochastic — same prompt, different answers. Caching captures one of those answers and reuses it. For deterministic flows (eval suites, factual recall) that's fine. For creative writing it freezes the output, which may or may not be what you want.
import hashlib
class PromptCache:
def __init__(self):
self.store = {}
self.hits = 0
self.misses = 0
def _key(self, prompt):
return hashlib.sha256(prompt.encode()).hexdigest()
def get_or_call(self, prompt, call_fn):
key = self._key(prompt)
if key in self.store:
self.hits += 1
return self.store[key]
result = call_fn(prompt)
self.store[key] = result
self.misses += 1
return resultEach cache hit avoids:
For an eval suite re-run with no prompt changes: every case is a hit. The whole suite runs in milliseconds and costs nothing.
The key must encode everything that would change the answer:
Miss any one and you'll serve a stale or wrong cached answer.
In the platform sandbox, the cache lives in a Python dict — wiped between runs. Production caches are persistent (Redis, SQLite, a Sheet). The interface is the same get_or_call. Only the storage backend changes.
Many LLM workflows ask the same question twice. Eval suites re-run cases. Users repeat queries. Sub-agents recompute identical sub-prompts. Caching: don't pay for the same call twice.
import hashlib
cache = {}
def cached_call(prompt):
key = hashlib.sha256(prompt.encode()).hexdigest()
if key in cache:
print("cache hit")
return cache[key]
result = Agent(model).run_sync(prompt).output
cache[key] = result
print("cache miss")
return resultWhy hash the prompt instead of using it as the key directly?
A 2KB prompt as a dict key works in Python, but hash gives you a fixed-size string that survives serialisation (Redis, Sheet, JSON). Same lookup, smaller storage. For an in-memory cache only, a plain cache[prompt] = result is fine too.
Same prompt always gives same answer?
With temperature > 0, the model is stochastic — same prompt, different answers. Caching captures one of those answers and reuses it. For deterministic flows (eval suites, factual recall) that's fine. For creative writing it freezes the output, which may or may not be what you want.
import hashlib
class PromptCache:
def __init__(self):
self.store = {}
self.hits = 0
self.misses = 0
def _key(self, prompt):
return hashlib.sha256(prompt.encode()).hexdigest()
def get_or_call(self, prompt, call_fn):
key = self._key(prompt)
if key in self.store:
self.hits += 1
return self.store[key]
result = call_fn(prompt)
self.store[key] = result
self.misses += 1
return resultEach cache hit avoids:
For an eval suite re-run with no prompt changes: every case is a hit. The whole suite runs in milliseconds and costs nothing.
The key must encode everything that would change the answer:
Miss any one and you'll serve a stale or wrong cached answer.
In the platform sandbox, the cache lives in a Python dict — wiped between runs. Production caches are persistent (Redis, SQLite, a Sheet). The interface is the same get_or_call. Only the storage backend changes.
Create a free account to get started. Paid plans unlock all tracks.