RAG — Retrieval-Augmented Generation. Three steps:
The model becomes a reading comprehension engine over your data, not a fact source.
contexts = [store[cid]["text"] for cid, _ in top_k(query, store, k=2)]
context_block = "\n".join(contexts)
prompt = f"""Use the context below to answer the question. If the answer isn't in the context, say so.
Context:
{context_block}
Question: {query}
Answer:"""
result = Agent(model).run_sync(prompt)And the LLM only knows what's in the context block?
Right — that's the whole point. You're not asking what does the model know. You're asking what does this paragraph say. The model becomes grounded.
What if my retrieval misses?
Then the answer is wrong, and the LLM may hallucinate to fill the gap. Tomorrow's lesson — citations — makes the failure visible. The lesson after — failure modes — categorises what goes wrong.
A single function that ties the week together:
def rag_answer(query, store, k=2):
# 1. Retrieve
top = top_k(query, store, k)
contexts = [store[cid]["text"] for cid, _ in top]
context_block = "\n".join(contexts)
# 2. Stuff
prompt = f'''Use the context to answer the question. If unsure, say "I don't know".
Context:
{context_block}
Question: {query}
Answer:'''
# 3. Generate
return Agent(model).run_sync(prompt).outputWithout RAG: model answers from training data. Stale, possibly wrong, no source.
With RAG: model answers from your data. Fresh, traceable, the source is in the prompt.
The LLM call is the dominant cost. Smaller k = fewer prompt tokens = cheaper.
For today's lesson the corpus is small enough that retrieval is rarely wrong. The point is the pipeline — retrieve → stuff → answer — wired end to end.
RAG — Retrieval-Augmented Generation. Three steps:
The model becomes a reading comprehension engine over your data, not a fact source.
contexts = [store[cid]["text"] for cid, _ in top_k(query, store, k=2)]
context_block = "\n".join(contexts)
prompt = f"""Use the context below to answer the question. If the answer isn't in the context, say so.
Context:
{context_block}
Question: {query}
Answer:"""
result = Agent(model).run_sync(prompt)And the LLM only knows what's in the context block?
Right — that's the whole point. You're not asking what does the model know. You're asking what does this paragraph say. The model becomes grounded.
What if my retrieval misses?
Then the answer is wrong, and the LLM may hallucinate to fill the gap. Tomorrow's lesson — citations — makes the failure visible. The lesson after — failure modes — categorises what goes wrong.
A single function that ties the week together:
def rag_answer(query, store, k=2):
# 1. Retrieve
top = top_k(query, store, k)
contexts = [store[cid]["text"] for cid, _ in top]
context_block = "\n".join(contexts)
# 2. Stuff
prompt = f'''Use the context to answer the question. If unsure, say "I don't know".
Context:
{context_block}
Question: {query}
Answer:'''
# 3. Generate
return Agent(model).run_sync(prompt).outputWithout RAG: model answers from training data. Stale, possibly wrong, no source.
With RAG: model answers from your data. Fresh, traceable, the source is in the prompt.
The LLM call is the dominant cost. Smaller k = fewer prompt tokens = cheaper.
For today's lesson the corpus is small enough that retrieval is rarely wrong. The point is the pipeline — retrieve → stuff → answer — wired end to end.
Create a free account to get started. Paid plans unlock all tracks.