Your context has grown to a thousand lines and is about to bust the model's token budget. What is the cheapest way to shrink it without losing the important facts?
Drop older lines? Or ask the model to summarize the whole history into a paragraph?
The summary is the lossy compression every stateful agent eventually needs. A summarizer agent reads the long context and returns a short blurb that preserves intent. The smallest version:
agent = Agent(model, system_prompt="Summarize the facts below into a short, factual paragraph.")
result = agent.run_sync(long_context)
print(result.output)So compression is one extra agent call — cheap to run once, expensive to skip when the context explodes?
Exactly that tradeoff. One call costs tokens; not compressing costs a budget overrun or a model cutoff. Wrap it and reuse:
def compress_memory(context: str) -> str:
agent = Agent(
model,
system_prompt="Summarize the facts below into a short, factual paragraph."
)
result = agent.run_sync(context)
return result.outputWhen do I actually trigger it? Every call? Only above a certain length?
Only above a threshold — 2000 characters is a reasonable start, or 500 tokens if you measure those. Compressing short contexts wastes an agent call and throws away detail that could have stayed verbatim. Measure first, compress when it matters.
And the compressed string drops back into the same system_prompt slot I used yesterday — memory shrinks, downstream code does not change?
Drops straight back in. Compression is a size transform, not a shape transform. Same string type in, smaller string out, every downstream pattern keeps working.
TL;DR: one agent call with a summarizer system_prompt turns a giant context into a short blurb.
| Context size | Action |
|---|---|
| Small (< 2 KB) | keep verbatim |
| Medium | optional compression |
| Large (> 2 KB) | always compress |
Compression is cheap token management — a single call trades detail for budget.
Your context has grown to a thousand lines and is about to bust the model's token budget. What is the cheapest way to shrink it without losing the important facts?
Drop older lines? Or ask the model to summarize the whole history into a paragraph?
The summary is the lossy compression every stateful agent eventually needs. A summarizer agent reads the long context and returns a short blurb that preserves intent. The smallest version:
agent = Agent(model, system_prompt="Summarize the facts below into a short, factual paragraph.")
result = agent.run_sync(long_context)
print(result.output)So compression is one extra agent call — cheap to run once, expensive to skip when the context explodes?
Exactly that tradeoff. One call costs tokens; not compressing costs a budget overrun or a model cutoff. Wrap it and reuse:
def compress_memory(context: str) -> str:
agent = Agent(
model,
system_prompt="Summarize the facts below into a short, factual paragraph."
)
result = agent.run_sync(context)
return result.outputWhen do I actually trigger it? Every call? Only above a certain length?
Only above a threshold — 2000 characters is a reasonable start, or 500 tokens if you measure those. Compressing short contexts wastes an agent call and throws away detail that could have stayed verbatim. Measure first, compress when it matters.
And the compressed string drops back into the same system_prompt slot I used yesterday — memory shrinks, downstream code does not change?
Drops straight back in. Compression is a size transform, not a shape transform. Same string type in, smaller string out, every downstream pattern keeps working.
TL;DR: one agent call with a summarizer system_prompt turns a giant context into a short blurb.
| Context size | Action |
|---|---|
| Small (< 2 KB) | keep verbatim |
| Medium | optional compression |
| Large (> 2 KB) | always compress |
Compression is cheap token management — a single call trades detail for budget.
Create a free account to get started. Paid plans unlock all tracks.