Sometimes one prompt isn't enough. Two prompts in sequence — where the output of the first becomes part of the input to the second — is called a chained prompt.
from pydantic_ai import Agent
import json
sentence = "Marie Curie won the Nobel Prize in Physics in 1903."
# Step 1 — extract entities as JSON
step1 = Agent(model).run_sync(
f'Extract every named entity from this sentence as a JSON list of strings. Return only the JSON.\n\nSentence: {sentence}'
)
entities = json.loads(step1.output.strip())
# Step 2 — for each entity, generate a question about it
questions_prompt = f'For each entity in this list, write one short question about it. Return as a JSON list of strings.\n\nEntities: {json.dumps(entities)}'
step2 = Agent(model).run_sync(questions_prompt)
questions = json.loads(step2.output.strip())
for q in questions:
print(q)So step 1's output is just a Python value I pass into step 2's prompt.
Right. The LLM doesn't know there's a chain — each call is independent. The chain lives in your code. The output of step 1 is parsed, validated, then reformatted into step 2's prompt. Each step is a regular run_sync.
Why not just do it in one prompt?
Sometimes you can. But chains help when: the steps need different prompts (extraction is one job, question-generation is another); or you want to inspect/log intermediates; or you might branch ("if no entities found, skip step 2"). Smaller prompts also tend to be more reliable than one giant compound prompt.
user input
↓
LLM step 1 (e.g., extract)
↓
parse + validate
↓
LLM step 2 (e.g., transform extracted)
↓
parse + validate
↓
final output
The chain lives in your Python code. Each LLM call is independent — no shared state on the model side.
| Reason | Example |
|---|---|
| Different jobs need different prompts | Extract entities (factual) → generate questions (creative) |
| You want to inspect intermediates | Log what was extracted before generating |
| Branching on intermediate output | If no entities, skip step 2 |
| Reliability — smaller prompts are easier | One 5-line prompt beats one 30-line prompt |
| Cost shaping — skip step 2 entirely if step 1 says "no" | Saves quota |
If the task is small, single-step, and you don't care about intermediates, just write one prompt. "Summarise this and translate the summary to French" can be one call. Chains are for when the structure matters.
# Step 1
out1 = Agent(model).run_sync(prompt_1).output
parsed1 = parse(out1) # validate + structure
# Step 2 — formatted with step 1's parsed output
prompt_2 = make_prompt_2(parsed1)
out2 = Agent(model).run_sync(prompt_2).output
parsed2 = parse(out2)
use(parsed2)Each step is one LLM call (or more if it uses tools). A 2-step chain costs ~2 quota slots. A 5-step chain costs ~5. Plan accordingly.
A tiny 2-step chain: extract entities from a sentence, then generate one question per entity. Loop length ≤ 3 entities to keep cost bounded.
Sometimes one prompt isn't enough. Two prompts in sequence — where the output of the first becomes part of the input to the second — is called a chained prompt.
from pydantic_ai import Agent
import json
sentence = "Marie Curie won the Nobel Prize in Physics in 1903."
# Step 1 — extract entities as JSON
step1 = Agent(model).run_sync(
f'Extract every named entity from this sentence as a JSON list of strings. Return only the JSON.\n\nSentence: {sentence}'
)
entities = json.loads(step1.output.strip())
# Step 2 — for each entity, generate a question about it
questions_prompt = f'For each entity in this list, write one short question about it. Return as a JSON list of strings.\n\nEntities: {json.dumps(entities)}'
step2 = Agent(model).run_sync(questions_prompt)
questions = json.loads(step2.output.strip())
for q in questions:
print(q)So step 1's output is just a Python value I pass into step 2's prompt.
Right. The LLM doesn't know there's a chain — each call is independent. The chain lives in your code. The output of step 1 is parsed, validated, then reformatted into step 2's prompt. Each step is a regular run_sync.
Why not just do it in one prompt?
Sometimes you can. But chains help when: the steps need different prompts (extraction is one job, question-generation is another); or you want to inspect/log intermediates; or you might branch ("if no entities found, skip step 2"). Smaller prompts also tend to be more reliable than one giant compound prompt.
user input
↓
LLM step 1 (e.g., extract)
↓
parse + validate
↓
LLM step 2 (e.g., transform extracted)
↓
parse + validate
↓
final output
The chain lives in your Python code. Each LLM call is independent — no shared state on the model side.
| Reason | Example |
|---|---|
| Different jobs need different prompts | Extract entities (factual) → generate questions (creative) |
| You want to inspect intermediates | Log what was extracted before generating |
| Branching on intermediate output | If no entities, skip step 2 |
| Reliability — smaller prompts are easier | One 5-line prompt beats one 30-line prompt |
| Cost shaping — skip step 2 entirely if step 1 says "no" | Saves quota |
If the task is small, single-step, and you don't care about intermediates, just write one prompt. "Summarise this and translate the summary to French" can be one call. Chains are for when the structure matters.
# Step 1
out1 = Agent(model).run_sync(prompt_1).output
parsed1 = parse(out1) # validate + structure
# Step 2 — formatted with step 1's parsed output
prompt_2 = make_prompt_2(parsed1)
out2 = Agent(model).run_sync(prompt_2).output
parsed2 = parse(out2)
use(parsed2)Each step is one LLM call (or more if it uses tools). A 2-step chain costs ~2 quota slots. A 5-step chain costs ~5. Plan accordingly.
A tiny 2-step chain: extract entities from a sentence, then generate one question per entity. Loop length ≤ 3 entities to keep cost bounded.
Create a free account to get started. Paid plans unlock all tracks.