How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Audit AI Summary Verbosity Across a Batch of Abstracts — Ai For Researchers

Day 21 · ~12 min●

shortest_response from yesterday finds the most concise output for one input. You're running your batch summary pipeline on two hundred abstracts and need to audit whether the word counts are consistent — some might be suspiciously short (model returned a partial answer) or suspiciously long (model ignored the word budget). How do you measure all of them at once?

Apply word_count_of_output from Day 4 inside a batch comprehension — same pattern as batch_classify, but returning len(output.split()) instead of a classification string. One list of integers, one per abstract.

Exactly. Wrap the Day 4 pattern in a comprehension:

python

agent = Agent(model, system_prompt="Summarize in 2 sentences.")
return [len(agent.run_sync(p).output.split()) for p in prompts]

What would a suspicious word count look like? How do I know if a summary is too short?

For a two-sentence summary, expect 20–60 words. A count under 10 suggests the model returned a fragment or just "yes" — the system prompt may not have been followed. A count over 80 suggests the model produced a paragraph. Flag both extremes for manual review. The counts tell you where to look; the contents tell you what happened.

So the word count is a quality control metric, not just a length check. I'd run [c for c in counts if c < 10 or c > 80] to get the indices that need inspection.

And pair with enumerate to get the abstract index alongside the count. Quality audit in three lines.

Two hundred summaries, two hundred counts, outliers flagged automatically. That's the same quality control I'd run on RA transcripts — audit, flag, review, re-code.

The outlier detection is three lines:

python

counts = batch_word_counts(prompts)
mean = sum(counts) / len(counts)
outliers = [p for p, c in zip(prompts, counts) if c < mean * 0.4]

Batch word counts flag something is wrong; reading the output tells you what.

Batch Word Counts

python

agent = Agent(model, system_prompt="Summarize in 2 sentences.")
return [len(agent.run_sync(p).output.split()) for p in prompts]

Quality audit thresholds

For a two-sentence summary system prompt:

Count < 10: likely fragment or non-answer — review
Count 20–60: expected range
Count > 80: model exceeded format — review

Quality audit workflow

Run batch_word_counts on your full batch
Flag indices where count < 10 or count > 80
Print the actual outputs at those indices
Revise the system prompt if failures cluster (e.g. all short counts share a topic type)
Re-run batch on the revised prompt

Day 21 · ~12 min●

Exactly. Wrap the Day 4 pattern in a comprehension:

python

agent = Agent(model, system_prompt="Summarize in 2 sentences.")
return [len(agent.run_sync(p).output.split()) for p in prompts]

What would a suspicious word count look like? How do I know if a summary is too short?

So the word count is a quality control metric, not just a length check. I'd run [c for c in counts if c < 10 or c > 80] to get the indices that need inspection.

And pair with enumerate to get the abstract index alongside the count. Quality audit in three lines.

Two hundred summaries, two hundred counts, outliers flagged automatically. That's the same quality control I'd run on RA transcripts — audit, flag, review, re-code.

The outlier detection is three lines:

python

counts = batch_word_counts(prompts)
mean = sum(counts) / len(counts)
outliers = [p for p, c in zip(prompts, counts) if c < mean * 0.4]

Batch word counts flag something is wrong; reading the output tells you what.

Batch Word Counts

python

agent = Agent(model, system_prompt="Summarize in 2 sentences.")
return [len(agent.run_sync(p).output.split()) for p in prompts]

Quality audit thresholds

For a two-sentence summary system prompt:

Count < 10: likely fragment or non-answer — review
Count 20–60: expected range
Count > 80: model exceeded format — review

Quality audit workflow

Run batch_word_counts on your full batch
Flag indices where count < 10 or count > 80
Print the actual outputs at those indices
Revise the system prompt if failures cluster (e.g. all short counts share a topic type)
Re-run batch on the revised prompt

Batch Word Counts

python

agent = Agent(model, system_prompt="Summarize in 2 sentences.")
return [len(agent.run_sync(p).output.split()) for p in prompts]

Quality audit thresholds

For a two-sentence summary system prompt:

Count < 10: likely fragment or non-answer — review

Count 20–60: expected range

Count > 80: model exceeded format — review

Quality audit workflow

Run batch_word_counts on your full batch

Flag indices where count < 10 or count > 80

Print the actual outputs at those indices

Revise the system prompt if failures cluster (e.g. all short counts share a topic type)

Re-run batch on the revised prompt

Batch Word Counts

python

agent = Agent(model, system_prompt="Summarize in 2 sentences.")
return [len(agent.run_sync(p).output.split()) for p in prompts]

Quality audit thresholds

For a two-sentence summary system prompt:

Count < 10: likely fragment or non-answer — review

Count 20–60: expected range

Count > 80: model exceeded format — review

Quality audit workflow

Run batch_word_counts on your full batch

Flag indices where count < 10 or count > 80

Print the actual outputs at those indices

Revise the system prompt if failures cluster (e.g. all short counts share a topic type)

Re-run batch on the revised prompt

Batch Word Counts

Quality audit thresholds

Quality audit workflow

Batch Word Counts

Quality audit thresholds

Quality audit workflow

Batch Word Counts

Quality audit thresholds

Quality audit workflow

Sign up to practice

Batch Word Counts

Quality audit thresholds

Quality audit workflow

Sign up to practice