How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Iteration via evals — Ai Patterns

Day 20 · ~12 min●

Yesterday: a suite that scores. Today: use the suite to drive prompt improvement. The discipline: change one thing, measure, keep what helped.

python

from pydantic_ai import Agent

cases = [
    ("the cat sleeps",          "animal"),
    ("the engine roared",       "machine"),
    ("the puppy wagged its tail", "animal"),
]

def score_prompt(prompt_template):
    passed = 0
    for sentence, expected in cases:
        out = Agent(model).run_sync(prompt_template.format(sentence=sentence)).output.strip().strip(".").lower()
        if out == expected:
            passed += 1
    return passed

before_template = "What is this about? {sentence}"
after_template = (
    "Classify the subject of this sentence as exactly one word: "
    '"animal" or "machine". Reply with only the single word.\n\n'
    "Sentence: {sentence}"
)

before = score_prompt(before_template)
after = score_prompt(after_template)
print(f"BEFORE: {before} / {len(cases)} passed")
print(f"AFTER:  {after} / {len(cases)} passed")

The vague "what is this about?" gives free-form answers, so they don't match animal or machine exactly. The tightened prompt forces the closed-set output and matches.

Right. The eval-driven loop: 1) write prompt, 2) run suite, 3) inspect failures, 4) tweak one thing, 5) re-run. Repeat until pass rate is acceptable. The suite is the ground truth — it tells you whether your tweak helped.

What if I tweak two things at once and the score improves?

You can't attribute the win. One change at a time is the discipline. If the score got better, was it the constraint to a closed set, or the explicit format instruction, or both? The suite can't tell you. So you only change one variable per iteration.

Eval-driven prompt iteration

write prompt v1
      down
run eval suite
      down
score (e.g., 2/5)
      down
inspect failures - what's going wrong?
      down
change ONE thing - sharper instruction, examples, format
      down
run eval suite
      down
score (e.g., 4/5)
      down
keep change if better, revert if worse
      down
(loop)

The eval suite is the judge. Your prompt iteration is the experiment. Without the judge, prompt iteration is vibes; with it, the iteration is empirical.

One change at a time

Anti-pattern	Better
Rewrite the whole prompt	Add ONE constraint, re-test
Add three examples and a system prompt	Add one example, re-test
Tweak temperature AND prompt	Tweak one, then the other

Isolate the variable. Otherwise you can't attribute the improvement, and you'll re-add hurtful changes later thinking they helped.

Failure inspection

After a run, look at the FAIL cases:

Same kind of mistake on multiple cases? -> systematic issue, fix the prompt
Random misses on different cases? -> sampling noise, may not be fixable with prompt alone
Edge case (one weird input)? -> add it to the suite as a regression test, even if you can't fix it yet

When to stop iterating

When the marginal improvement isn't worth the time. If iteration 5 took you from 8/10 to 9/10 in two hours, that's good. If iteration 6 takes another two hours to maybe go to 9.1/10, ship it.

Today

Three cases. Two prompt versions. The vague version probably scores 0-1; the tight version probably scores 3. Verification asserts after >= before and after >= 2.

Day 20 · ~12 min●

Yesterday: a suite that scores. Today: use the suite to drive prompt improvement. The discipline: change one thing, measure, keep what helped.

python

from pydantic_ai import Agent

cases = [
    ("the cat sleeps",          "animal"),
    ("the engine roared",       "machine"),
    ("the puppy wagged its tail", "animal"),
]

def score_prompt(prompt_template):
    passed = 0
    for sentence, expected in cases:
        out = Agent(model).run_sync(prompt_template.format(sentence=sentence)).output.strip().strip(".").lower()
        if out == expected:
            passed += 1
    return passed

before_template = "What is this about? {sentence}"
after_template = (
    "Classify the subject of this sentence as exactly one word: "
    '"animal" or "machine". Reply with only the single word.\n\n'
    "Sentence: {sentence}"
)

before = score_prompt(before_template)
after = score_prompt(after_template)
print(f"BEFORE: {before} / {len(cases)} passed")
print(f"AFTER:  {after} / {len(cases)} passed")

The vague "what is this about?" gives free-form answers, so they don't match animal or machine exactly. The tightened prompt forces the closed-set output and matches.

What if I tweak two things at once and the score improves?

Eval-driven prompt iteration

write prompt v1
      down
run eval suite
      down
score (e.g., 2/5)
      down
inspect failures - what's going wrong?
      down
change ONE thing - sharper instruction, examples, format
      down
run eval suite
      down
score (e.g., 4/5)
      down
keep change if better, revert if worse
      down
(loop)

The eval suite is the judge. Your prompt iteration is the experiment. Without the judge, prompt iteration is vibes; with it, the iteration is empirical.

One change at a time

Anti-pattern	Better
Rewrite the whole prompt	Add ONE constraint, re-test
Add three examples and a system prompt	Add one example, re-test
Tweak temperature AND prompt	Tweak one, then the other

Isolate the variable. Otherwise you can't attribute the improvement, and you'll re-add hurtful changes later thinking they helped.

Failure inspection

After a run, look at the FAIL cases:

Same kind of mistake on multiple cases? -> systematic issue, fix the prompt
Random misses on different cases? -> sampling noise, may not be fixable with prompt alone
Edge case (one weird input)? -> add it to the suite as a regression test, even if you can't fix it yet

When to stop iterating

When the marginal improvement isn't worth the time. If iteration 5 took you from 8/10 to 9/10 in two hours, that's good. If iteration 6 takes another two hours to maybe go to 9.1/10, ship it.

Today

Three cases. Two prompt versions. The vague version probably scores 0-1; the tight version probably scores 3. Verification asserts after >= before and after >= 2.

Eval-driven prompt iteration

One change at a time

Failure inspection

When to stop iterating

Today

Eval-driven prompt iteration

One change at a time

Failure inspection

When to stop iterating

Today

Eval-driven prompt iteration

One change at a time

Failure inspection

When to stop iterating

Today

Sign up to practice

Eval-driven prompt iteration

One change at a time

Failure inspection

When to stop iterating

Today

Sign up to practice