Yesterday's brief returned prose; the day before, a structured triple. What does the caller get when you also want a confidence label — a high/medium/low rating on how reliable the answer is?
A Pydantic model with two fields — answer as a string and confidence as a Literal? The agent fills in both at once.
Exactly. One call, two fields — one free-form, one constrained. The class:
from pydantic import BaseModel
from typing import Literal
class Answer(BaseModel):
answer: str
confidence: Literal["high", "medium", "low"]Pydantic validates both fields — answer can be any string, confidence must be one of the three labels. Neither drifts.
So the same model combines two constraints — one free, one closed. The agent has to satisfy both to return a valid Answer?
Exactly. Pydantic enforces the full class schema — if the model writes confidence: "medium-high" the validation fails and Pydantic retries. You always get a parseable Answer instance, never a stray string slipping through. The full function:
def confidence_rated_answer(question: str) -> dict:
results = search(question, count=5)
context = " ".join(r["snippet"] for r in results)
prompt = f"Context: {context} Question: {question} Answer and rate confidence."
agent = Agent(model, result_type=Answer, system_prompt="Rate high/medium/low based on how well the context covers the question.")
return agent.run_sync(prompt).output.model_dump()Why ask the agent to self-rate? Doesn't it always think it's right?
Modern agents are better at calibration than you'd expect — when the prompt explicitly asks for context-coverage assessment, they often admit when the retrieved snippets don't actually answer the question. That self-assessment isn't perfect, but it's signal your downstream code can branch on: accept high, verify medium, retry or reject low.
So the caller gets back not just an answer, but a machine-readable reliability marker — they can decide what to trust automatically.
Exactly. Answer plus confidence is the minimal version of calibrated RAG. Tomorrow's capstone wraps this into a full research assistant with sources, keywords, and confidence all in one call.
TL;DR: Pydantic combines a free-form answer: str with a closed confidence: Literal[...] in the same model.
.model_dump() — returns a plain dict for downstream| Label | Meaning |
|---|---|
high | context clearly covers question |
medium | partial coverage |
low | gaps or speculation |
Downstream code branches on confidence — accept, verify, or retry.
Yesterday's brief returned prose; the day before, a structured triple. What does the caller get when you also want a confidence label — a high/medium/low rating on how reliable the answer is?
A Pydantic model with two fields — answer as a string and confidence as a Literal? The agent fills in both at once.
Exactly. One call, two fields — one free-form, one constrained. The class:
from pydantic import BaseModel
from typing import Literal
class Answer(BaseModel):
answer: str
confidence: Literal["high", "medium", "low"]Pydantic validates both fields — answer can be any string, confidence must be one of the three labels. Neither drifts.
So the same model combines two constraints — one free, one closed. The agent has to satisfy both to return a valid Answer?
Exactly. Pydantic enforces the full class schema — if the model writes confidence: "medium-high" the validation fails and Pydantic retries. You always get a parseable Answer instance, never a stray string slipping through. The full function:
def confidence_rated_answer(question: str) -> dict:
results = search(question, count=5)
context = " ".join(r["snippet"] for r in results)
prompt = f"Context: {context} Question: {question} Answer and rate confidence."
agent = Agent(model, result_type=Answer, system_prompt="Rate high/medium/low based on how well the context covers the question.")
return agent.run_sync(prompt).output.model_dump()Why ask the agent to self-rate? Doesn't it always think it's right?
Modern agents are better at calibration than you'd expect — when the prompt explicitly asks for context-coverage assessment, they often admit when the retrieved snippets don't actually answer the question. That self-assessment isn't perfect, but it's signal your downstream code can branch on: accept high, verify medium, retry or reject low.
So the caller gets back not just an answer, but a machine-readable reliability marker — they can decide what to trust automatically.
Exactly. Answer plus confidence is the minimal version of calibrated RAG. Tomorrow's capstone wraps this into a full research assistant with sources, keywords, and confidence all in one call.
TL;DR: Pydantic combines a free-form answer: str with a closed confidence: Literal[...] in the same model.
.model_dump() — returns a plain dict for downstream| Label | Meaning |
|---|---|
high | context clearly covers question |
medium | partial coverage |
low | gaps or speculation |
Downstream code branches on confidence — accept, verify, or retry.
Create a free account to get started. Paid plans unlock all tracks.