Yesterday: "think step by step". Today: also show the model what step-by-step looks like for your task. Two solved examples teach the format better than any instruction.
prompt = '''Classify each statement as fact or opinion. Show your reasoning briefly, then give the label.
Statement: "The Pacific Ocean is the largest ocean."
Reasoning: This is a verifiable geographic measurement.
Label: fact
Statement: "Chocolate is the best ice cream flavour."
Reasoning: This is a personal preference, not measurable.
Label: opinion
Statement: "Water boils at 100 degrees Celsius at sea level."
Reasoning:'''
result = Agent(model).run_sync(prompt)
print(result.output)The two examples teach the format — Reasoning then Label — and the kind of reasoning. The model continues the pattern.
Right. Few-shot prompts give the model concrete templates. With CoT alone, the model picks its own format for the reasoning trace; with few-shot CoT, you pick the format and the model copies it. Easier to parse afterwards.
How many examples?
2-3 is the sweet spot. One example often isn't enough to disambiguate the pattern. Five+ starts hurting (longer prompts, marginal returns). The classic paper uses 8 examples; for production tasks 2-3 is what most teams ship with.
Combine two patterns:
[task description]
[example 1 input]
Reasoning: [example 1 reasoning]
Label: [example 1 answer]
[example 2 input]
Reasoning: [example 2 reasoning]
Label: [example 2 answer]
[real input]
Reasoning:
Note how the prompt ends mid-pattern — "Reasoning:" with nothing after it. The model is forced to continue the structure, producing reasoning then label.
| Element | What the model learns |
|---|---|
| The example input format | What kind of input to expect |
| The reasoning style | How long, how structured, what to focus on |
| The label format | One word? A phrase? A choice from a closed set? |
| The full sequence | Reasoning before label, not after |
If your examples have one-sentence reasoning, the model produces one-sentence reasoning. If your labels are lowercase, the model produces lowercase. The discipline of the examples becomes the discipline of the output.
After Reasoning: ... Label: fact comes back, extract the label:
import re
match = re.search(r"Label:\s*(\w+)", result.output, re.IGNORECASE)
label = match.group(1).lower() if match else NoneThe few-shot format makes this regex parse possible — the model committed to your format because the examples did.
Yesterday: "think step by step". Today: also show the model what step-by-step looks like for your task. Two solved examples teach the format better than any instruction.
prompt = '''Classify each statement as fact or opinion. Show your reasoning briefly, then give the label.
Statement: "The Pacific Ocean is the largest ocean."
Reasoning: This is a verifiable geographic measurement.
Label: fact
Statement: "Chocolate is the best ice cream flavour."
Reasoning: This is a personal preference, not measurable.
Label: opinion
Statement: "Water boils at 100 degrees Celsius at sea level."
Reasoning:'''
result = Agent(model).run_sync(prompt)
print(result.output)The two examples teach the format — Reasoning then Label — and the kind of reasoning. The model continues the pattern.
Right. Few-shot prompts give the model concrete templates. With CoT alone, the model picks its own format for the reasoning trace; with few-shot CoT, you pick the format and the model copies it. Easier to parse afterwards.
How many examples?
2-3 is the sweet spot. One example often isn't enough to disambiguate the pattern. Five+ starts hurting (longer prompts, marginal returns). The classic paper uses 8 examples; for production tasks 2-3 is what most teams ship with.
Combine two patterns:
[task description]
[example 1 input]
Reasoning: [example 1 reasoning]
Label: [example 1 answer]
[example 2 input]
Reasoning: [example 2 reasoning]
Label: [example 2 answer]
[real input]
Reasoning:
Note how the prompt ends mid-pattern — "Reasoning:" with nothing after it. The model is forced to continue the structure, producing reasoning then label.
| Element | What the model learns |
|---|---|
| The example input format | What kind of input to expect |
| The reasoning style | How long, how structured, what to focus on |
| The label format | One word? A phrase? A choice from a closed set? |
| The full sequence | Reasoning before label, not after |
If your examples have one-sentence reasoning, the model produces one-sentence reasoning. If your labels are lowercase, the model produces lowercase. The discipline of the examples becomes the discipline of the output.
After Reasoning: ... Label: fact comes back, extract the label:
import re
match = re.search(r"Label:\s*(\w+)", result.output, re.IGNORECASE)
label = match.group(1).lower() if match else NoneThe few-shot format makes this regex parse possible — the model committed to your format because the examples did.
Create a free account to get started. Paid plans unlock all tracks.