A useful trick: ask the model to critique its own work, then ask for a revision. Three calls, surprisingly often sharper output than one call.
from pydantic_ai import Agent
topic = "why bicycles are good for cities"
# v1 — initial draft
v1 = Agent(model).run_sync(f"Write a 2-sentence pitch: {topic}").output
# Critique
critique = Agent(model).run_sync(
f"Critique this 2-sentence pitch in one sentence — what's the weakest part?\n\nPitch: {v1}"
).output
# v2 — revised
v2 = Agent(model).run_sync(
f"Rewrite this pitch to address the critique. Keep it to 2 sentences.\n\nPitch: {v1}\n\nCritique: {critique}"
).output
print("V1:", v1)
print("CRITIQUE:", critique)
print("V2:", v2)Three LLM calls, three roles — writer, critic, reviser. Each is a fresh call without prior history.
Right. Same model, three different jobs. The fresh-call discipline is on purpose — the critic isn't biased by having authored v1, it just sees the pitch as text. Often surprisingly honest about weaknesses.
Does it actually improve things?
Frequently, yes. Especially for: writing tasks, reasoning chains, code review. Less so for: simple recall, single-token classification (no reasoning to critique). The pattern stacks with everything else — you can self-critique inside a chain, inside an agent loop, or as a once-over before final output.
v1 ← LLM(write the thing)
critique ← LLM("what's weak about this?")
v2 ← LLM("revise to fix the critique")
Three calls. Each has a different job:
Same model. Different prompts. Each call is independent — the critic doesn't know the writer was the same model. The honesty comes from that separation.
Three LLM calls per critique cycle. If you run the cycle twice (v1 → critique → v2 → critique → v3), it's five calls. Plan accordingly. For batch jobs, reserve self-critique for the highest-value items, not every row.
Keep this lesson's pattern simple: one writer, one critic, one reviser. Three prints — V1, CRITIQUE, V2.
A useful trick: ask the model to critique its own work, then ask for a revision. Three calls, surprisingly often sharper output than one call.
from pydantic_ai import Agent
topic = "why bicycles are good for cities"
# v1 — initial draft
v1 = Agent(model).run_sync(f"Write a 2-sentence pitch: {topic}").output
# Critique
critique = Agent(model).run_sync(
f"Critique this 2-sentence pitch in one sentence — what's the weakest part?\n\nPitch: {v1}"
).output
# v2 — revised
v2 = Agent(model).run_sync(
f"Rewrite this pitch to address the critique. Keep it to 2 sentences.\n\nPitch: {v1}\n\nCritique: {critique}"
).output
print("V1:", v1)
print("CRITIQUE:", critique)
print("V2:", v2)Three LLM calls, three roles — writer, critic, reviser. Each is a fresh call without prior history.
Right. Same model, three different jobs. The fresh-call discipline is on purpose — the critic isn't biased by having authored v1, it just sees the pitch as text. Often surprisingly honest about weaknesses.
Does it actually improve things?
Frequently, yes. Especially for: writing tasks, reasoning chains, code review. Less so for: simple recall, single-token classification (no reasoning to critique). The pattern stacks with everything else — you can self-critique inside a chain, inside an agent loop, or as a once-over before final output.
v1 ← LLM(write the thing)
critique ← LLM("what's weak about this?")
v2 ← LLM("revise to fix the critique")
Three calls. Each has a different job:
Same model. Different prompts. Each call is independent — the critic doesn't know the writer was the same model. The honesty comes from that separation.
Three LLM calls per critique cycle. If you run the cycle twice (v1 → critique → v2 → critique → v3), it's five calls. Plan accordingly. For batch jobs, reserve self-critique for the highest-value items, not every row.
Keep this lesson's pattern simple: one writer, one critic, one reviser. Three prints — V1, CRITIQUE, V2.
Create a free account to get started. Paid plans unlock all tracks.