You have a draft and you want to know how good it is before shipping. How do you get a machine-readable score back from an agent?
A prompt like "rate this 1 to 10"? Then parse the number out of the text? That feels fragile.
Fragile is the right word. A Pydantic model with score: int and feedback: str forces the shape — no parsing, just typed fields. The critic:
from pydantic import BaseModel
class Score(BaseModel):
score: int
feedback: str
critic = Agent(model, result_type=Score, system_prompt="Score the response 1-10 with one sentence of feedback.")
result = critic.run_sync(draft)
print(result.output.score, result.output.feedback)And the critic reads the draft as the user message, returns a Score with both fields, and my code branches on score.score?
That is exactly the loop. Score becomes a signal — keep if it is high, refine if it is low. The full evaluator:
def evaluate_response(draft: str) -> dict:
critic = Agent(model, result_type=Score, system_prompt="Score the response 1-10 with one sentence of feedback.")
result = critic.run_sync(draft)
data = result.output.model_dump()
return dataWhy return a plain dict instead of the Score instance directly?
Because downstream code — the refiner, a logger, a dashboard — expects JSON-serializable values. .model_dump() turns the typed Score into {"score": 8, "feedback": "..."}, portable everywhere. Typed inside, plain outside.
So with a typed score I can drive logic — if score < 7 refine else ship — without regex or string parsing?
Typed scores unlock automation. One integer field is the difference between a demo and a refinement pipeline. Tomorrow the loop writes itself around this critic.
TL;DR: result_type=Score turns prose judgment into a machine-actionable {score, feedback} dict.
class Score(BaseModel) — score: int + feedback: strsystem_prompt — "rate 1-10 with one sentence of feedback".model_dump() — plain dict for downstream loops| Output | Actionable |
|---|---|
| Prose "this is decent" | Needs parsing |
{"score": 8, "feedback": "..."} | Machine-driven branching |
A typed score is the hinge of every feedback loop — numeric signal in, branch logic out.
You have a draft and you want to know how good it is before shipping. How do you get a machine-readable score back from an agent?
A prompt like "rate this 1 to 10"? Then parse the number out of the text? That feels fragile.
Fragile is the right word. A Pydantic model with score: int and feedback: str forces the shape — no parsing, just typed fields. The critic:
from pydantic import BaseModel
class Score(BaseModel):
score: int
feedback: str
critic = Agent(model, result_type=Score, system_prompt="Score the response 1-10 with one sentence of feedback.")
result = critic.run_sync(draft)
print(result.output.score, result.output.feedback)And the critic reads the draft as the user message, returns a Score with both fields, and my code branches on score.score?
That is exactly the loop. Score becomes a signal — keep if it is high, refine if it is low. The full evaluator:
def evaluate_response(draft: str) -> dict:
critic = Agent(model, result_type=Score, system_prompt="Score the response 1-10 with one sentence of feedback.")
result = critic.run_sync(draft)
data = result.output.model_dump()
return dataWhy return a plain dict instead of the Score instance directly?
Because downstream code — the refiner, a logger, a dashboard — expects JSON-serializable values. .model_dump() turns the typed Score into {"score": 8, "feedback": "..."}, portable everywhere. Typed inside, plain outside.
So with a typed score I can drive logic — if score < 7 refine else ship — without regex or string parsing?
Typed scores unlock automation. One integer field is the difference between a demo and a refinement pipeline. Tomorrow the loop writes itself around this critic.
TL;DR: result_type=Score turns prose judgment into a machine-actionable {score, feedback} dict.
class Score(BaseModel) — score: int + feedback: strsystem_prompt — "rate 1-10 with one sentence of feedback".model_dump() — plain dict for downstream loops| Output | Actionable |
|---|---|
| Prose "this is decent" | Needs parsing |
{"score": 8, "feedback": "..."} | Machine-driven branching |
A typed score is the hinge of every feedback loop — numeric signal in, branch logic out.
Create a free account to get started. Paid plans unlock all tracks.