Every Week 2 function summarized or classified. Today the agent actually answers a question — and it answers using snippets you retrieved, not its training-data priors. What is the smallest pipeline that does this?
One search() call to pull relevant snippets, one agent call to read them and answer the question. The snippets go inline in the prompt as context?
Exactly. This is the canonical RAG shape — retrieve, then read and answer. The minimal version:
results = search(question, count=5)
context = " ".join(r["snippet"] for r in results)
prompt = f"Context: {context} Question: {question} Answer from context."
result = Agent(model).run_sync(prompt)
print(result.output)Notice the prompt structure: context first, then the question, then a directive to answer from context. The order matters — the agent reads the context before seeing what's being asked of it.
So if the snippets don't contain the answer, the agent should say so instead of making something up? That's a real behavior change?
It's the behavior you want. "Answer from context" tells the model to stay grounded. With a well-scoped query, the retrieved snippets carry the answer; without it, the agent admits ignorance rather than confabulating. The full function:
def retrieve_and_answer(question: str) -> str:
results = search(question, count=5)
context = " ".join(r["snippet"] for r in results)
prompt = f"Context: {context} Question: {question} Answer from context."
return Agent(model).run_sync(prompt).outputWhy count=5? Does more context always mean better answers?
Up to a point. More snippets give the agent more signal, but also more noise and longer prompts. Five is a reasonable default — three is often enough for well-known questions, ten starts bloating prompts. For production RAG systems you tune this based on the corpus; for a generalist web RAG, five hits the balance.
So this is a research assistant in three lines. I type a question, and it retrieves and answers — no pretraining cutoff, no stale knowledge.
Exactly. The live web is the context window. This is the pattern behind every RAG product you've ever used — minus caching, reranking, and chunking, which Week 3 already taught.
TL;DR: retrieve snippets → concatenate into context → agent answers from context.
count=5 gives good signal-to-noise| Piece | Purpose |
|---|---|
| Context | retrieved snippets |
| Question | what's being asked |
| Directive | stay grounded |
Swap the retriever or the model and the shape stays — RAG is this pattern, scaled.
Every Week 2 function summarized or classified. Today the agent actually answers a question — and it answers using snippets you retrieved, not its training-data priors. What is the smallest pipeline that does this?
One search() call to pull relevant snippets, one agent call to read them and answer the question. The snippets go inline in the prompt as context?
Exactly. This is the canonical RAG shape — retrieve, then read and answer. The minimal version:
results = search(question, count=5)
context = " ".join(r["snippet"] for r in results)
prompt = f"Context: {context} Question: {question} Answer from context."
result = Agent(model).run_sync(prompt)
print(result.output)Notice the prompt structure: context first, then the question, then a directive to answer from context. The order matters — the agent reads the context before seeing what's being asked of it.
So if the snippets don't contain the answer, the agent should say so instead of making something up? That's a real behavior change?
It's the behavior you want. "Answer from context" tells the model to stay grounded. With a well-scoped query, the retrieved snippets carry the answer; without it, the agent admits ignorance rather than confabulating. The full function:
def retrieve_and_answer(question: str) -> str:
results = search(question, count=5)
context = " ".join(r["snippet"] for r in results)
prompt = f"Context: {context} Question: {question} Answer from context."
return Agent(model).run_sync(prompt).outputWhy count=5? Does more context always mean better answers?
Up to a point. More snippets give the agent more signal, but also more noise and longer prompts. Five is a reasonable default — three is often enough for well-known questions, ten starts bloating prompts. For production RAG systems you tune this based on the corpus; for a generalist web RAG, five hits the balance.
So this is a research assistant in three lines. I type a question, and it retrieves and answers — no pretraining cutoff, no stale knowledge.
Exactly. The live web is the context window. This is the pattern behind every RAG product you've ever used — minus caching, reranking, and chunking, which Week 3 already taught.
TL;DR: retrieve snippets → concatenate into context → agent answers from context.
count=5 gives good signal-to-noise| Piece | Purpose |
|---|---|
| Context | retrieved snippets |
| Question | what's being asked |
| Directive | stay grounded |
Swap the retriever or the model and the shape stays — RAG is this pattern, scaled.
Create a free account to get started. Paid plans unlock all tracks.