shortest_response from yesterday finds the most concise output for one input. You're running your batch summary pipeline on two hundred abstracts and need to audit whether the word counts are consistent — some might be suspiciously short (model returned a partial answer) or suspiciously long (model ignored the word budget). How do you measure all of them at once?
Apply word_count_of_output from Day 4 inside a batch comprehension — same pattern as batch_classify, but returning len(output.split()) instead of a classification string. One list of integers, one per abstract.
Exactly. Wrap the Day 4 pattern in a comprehension:
agent = Agent(model, system_prompt="Summarize in 2 sentences.")
return [len(agent.run_sync(p).output.split()) for p in prompts]What would a suspicious word count look like? How do I know if a summary is too short?
For a two-sentence summary, expect 20–60 words. A count under 10 suggests the model returned a fragment or just "yes" — the system prompt may not have been followed. A count over 80 suggests the model produced a paragraph. Flag both extremes for manual review. The counts tell you where to look; the contents tell you what happened.
So the word count is a quality control metric, not just a length check. I'd run [c for c in counts if c < 10 or c > 80] to get the indices that need inspection.
And pair with enumerate to get the abstract index alongside the count. Quality audit in three lines.
Two hundred summaries, two hundred counts, outliers flagged automatically. That's the same quality control I'd run on RA transcripts — audit, flag, review, re-code.
The outlier detection is three lines:
counts = batch_word_counts(prompts)
mean = sum(counts) / len(counts)
outliers = [p for p, c in zip(prompts, counts) if c < mean * 0.4]Batch word counts flag something is wrong; reading the output tells you what.
agent = Agent(model, system_prompt="Summarize in 2 sentences.")
return [len(agent.run_sync(p).output.split()) for p in prompts]For a two-sentence summary system prompt:
batch_word_counts on your full batchcount < 10 or count > 80shortest_response from yesterday finds the most concise output for one input. You're running your batch summary pipeline on two hundred abstracts and need to audit whether the word counts are consistent — some might be suspiciously short (model returned a partial answer) or suspiciously long (model ignored the word budget). How do you measure all of them at once?
Apply word_count_of_output from Day 4 inside a batch comprehension — same pattern as batch_classify, but returning len(output.split()) instead of a classification string. One list of integers, one per abstract.
Exactly. Wrap the Day 4 pattern in a comprehension:
agent = Agent(model, system_prompt="Summarize in 2 sentences.")
return [len(agent.run_sync(p).output.split()) for p in prompts]What would a suspicious word count look like? How do I know if a summary is too short?
For a two-sentence summary, expect 20–60 words. A count under 10 suggests the model returned a fragment or just "yes" — the system prompt may not have been followed. A count over 80 suggests the model produced a paragraph. Flag both extremes for manual review. The counts tell you where to look; the contents tell you what happened.
So the word count is a quality control metric, not just a length check. I'd run [c for c in counts if c < 10 or c > 80] to get the indices that need inspection.
And pair with enumerate to get the abstract index alongside the count. Quality audit in three lines.
Two hundred summaries, two hundred counts, outliers flagged automatically. That's the same quality control I'd run on RA transcripts — audit, flag, review, re-code.
The outlier detection is three lines:
counts = batch_word_counts(prompts)
mean = sum(counts) / len(counts)
outliers = [p for p, c in zip(prompts, counts) if c < mean * 0.4]Batch word counts flag something is wrong; reading the output tells you what.
agent = Agent(model, system_prompt="Summarize in 2 sentences.")
return [len(agent.run_sync(p).output.split()) for p in prompts]For a two-sentence summary system prompt:
batch_word_counts on your full batchcount < 10 or count > 80Create a free account to get started. Paid plans unlock all tracks.