How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Tiny eval suite — Ai Patterns

Day 19 · ~12 min●

L13 was one output checked against multiple criteria. Today: multiple outputs checked against expected answers. Five inputs, five expected labels — score how many your prompt got right.

python

from pydantic_ai import Agent

cases = [
    ("This is amazing!",       "positive"),
    ("Worst experience ever.", "negative"),
    ("It was fine I guess.",   "neutral"),
    ("Loved every minute.",    "positive"),
    ("Total waste of time.",   "negative"),
]

def classify(text):
    return Agent(model).run_sync(
        f'Classify the sentiment of this text as exactly one word: "positive", "negative", or "neutral". Reply with only the single word.\n\nText: {text}'
    ).output.strip().strip(".").lower()

pass_count = 0
for text, expected in cases:
    got = classify(text)
    ok = got == expected
    pass_count += ok
    print(f"  {'PASS' if ok else 'FAIL'} — got={got!r} expected={expected!r} — {text}")

print(f"\n{pass_count} / {len(cases)} passed")
assert pass_count >= 3, f"only {pass_count}/5 passed"

Each case has a known expected answer. The eval is got == expected, summed.

Right. The suite is your safety net when you change a prompt — re-run it, see how many cases still pass, decide whether the change is an improvement or a regression.

Why ≥ 3 instead of all 5?

LLMs are non-deterministic. A pass-rate threshold (3/5 = 60%) makes the verification stable across runs. Real eval suites set thresholds per cost/severity — for safety-critical classifiers you want 99%; for opinion mining 80% might be fine. Today's threshold is loose because we want the lesson to pass; production would tighten.

Eval suite

python

cases = [(input_1, expected_1), ..., (input_n, expected_n)]

pass_count = 0
for inp, exp in cases:
    got = run(inp)
    if matches(got, exp):
        pass_count += 1

pass_rate = pass_count / len(cases)
assert pass_rate >= THRESHOLD

A list of (input, expected) pairs + a runner + a comparator. Score is fraction passing.

Pieces

Piece	Role
`cases`	The dataset — fixed. Becomes your regression suite.
`run(input)`	Calls the LLM (or runs the agent) under your current prompt.
`matches(got, expected)`	The comparator. Equality, regex, fuzzy match, range.
`THRESHOLD`	Acceptable pass rate. Higher = stricter.

Pass-rate threshold (not 100%)

LLMs are sampled. Even a perfect prompt fails occasionally. Aim for thresholds that:

Catch real regressions (drops from 5/5 to 3/5 are noticed)
Don't fire on noise (3/5 → 2/5 might be one stochastic miss)

For lessons we set the bar low so the verification is stable. For production, calibrate over many runs to find your steady-state pass rate, then alarm when it drops.

Comparator strategies

Comparator	When to use
`got == expected`	Exact label, classifier with closed set
`expected.lower() in got.lower()`	Loose substring, allows wrapping prose
Regex	Format checks (phone, date)
Numeric tolerance	Math problems where rounding matters
Set equality	Order-independent lists

Why no `pytest` / `unittest`?

The runtime doesn't ship those frameworks. We write the loop in plain Python. The structure is identical to what pytest.parametrize produces — a fixture list + assertion — without the framework overhead.

Today's task

Five sentiment classification cases. Verification asserts at least 3 pass. Realistically the model will pass 4 or 5; the loose bar absorbs sampling noise.

Day 19 · ~12 min●

L13 was one output checked against multiple criteria. Today: multiple outputs checked against expected answers. Five inputs, five expected labels — score how many your prompt got right.

python

from pydantic_ai import Agent

cases = [
    ("This is amazing!",       "positive"),
    ("Worst experience ever.", "negative"),
    ("It was fine I guess.",   "neutral"),
    ("Loved every minute.",    "positive"),
    ("Total waste of time.",   "negative"),
]

def classify(text):
    return Agent(model).run_sync(
        f'Classify the sentiment of this text as exactly one word: "positive", "negative", or "neutral". Reply with only the single word.\n\nText: {text}'
    ).output.strip().strip(".").lower()

pass_count = 0
for text, expected in cases:
    got = classify(text)
    ok = got == expected
    pass_count += ok
    print(f"  {'PASS' if ok else 'FAIL'} — got={got!r} expected={expected!r} — {text}")

print(f"\n{pass_count} / {len(cases)} passed")
assert pass_count >= 3, f"only {pass_count}/5 passed"

Each case has a known expected answer. The eval is got == expected, summed.

Right. The suite is your safety net when you change a prompt — re-run it, see how many cases still pass, decide whether the change is an improvement or a regression.

Why ≥ 3 instead of all 5?

Eval suite

python

cases = [(input_1, expected_1), ..., (input_n, expected_n)]

pass_count = 0
for inp, exp in cases:
    got = run(inp)
    if matches(got, exp):
        pass_count += 1

pass_rate = pass_count / len(cases)
assert pass_rate >= THRESHOLD

A list of (input, expected) pairs + a runner + a comparator. Score is fraction passing.

Pieces

Piece	Role
`cases`	The dataset — fixed. Becomes your regression suite.
`run(input)`	Calls the LLM (or runs the agent) under your current prompt.
`matches(got, expected)`	The comparator. Equality, regex, fuzzy match, range.
`THRESHOLD`	Acceptable pass rate. Higher = stricter.

Pass-rate threshold (not 100%)

LLMs are sampled. Even a perfect prompt fails occasionally. Aim for thresholds that:

Catch real regressions (drops from 5/5 to 3/5 are noticed)
Don't fire on noise (3/5 → 2/5 might be one stochastic miss)

For lessons we set the bar low so the verification is stable. For production, calibrate over many runs to find your steady-state pass rate, then alarm when it drops.

Comparator strategies

Comparator	When to use
`got == expected`	Exact label, classifier with closed set
`expected.lower() in got.lower()`	Loose substring, allows wrapping prose
Regex	Format checks (phone, date)
Numeric tolerance	Math problems where rounding matters
Set equality	Order-independent lists

Why no `pytest` / `unittest`?

Today's task

Five sentiment classification cases. Verification asserts at least 3 pass. Realistically the model will pass 4 or 5; the loose bar absorbs sampling noise.

Eval suite

Pieces

Pass-rate threshold (not 100%)

Comparator strategies

Why no `pytest` / `unittest`?

Today's task

Eval suite

Pieces

Pass-rate threshold (not 100%)

Comparator strategies

Why no `pytest` / `unittest`?

Today's task

Eval suite

Pieces

Pass-rate threshold (not 100%)

Comparator strategies

Why no `pytest` / `unittest`?

Today's task

Sign up to practice

Eval suite

Pieces

Pass-rate threshold (not 100%)

Comparator strategies

Why no `pytest` / `unittest`?

Today's task

Sign up to practice

Eval suite

Pieces

Pass-rate threshold (not 100%)

Comparator strategies

Why no pytest / unittest?

Today's task

Eval suite

Pieces

Pass-rate threshold (not 100%)

Comparator strategies

Why no pytest / unittest?

Today's task

Eval suite

Pieces

Pass-rate threshold (not 100%)

Comparator strategies

Why no pytest / unittest?

Today's task

Sign up to practice

Eval suite

Pieces

Pass-rate threshold (not 100%)

Comparator strategies

Why no pytest / unittest?

Today's task

Sign up to practice

Why no `pytest` / `unittest`?

Why no `pytest` / `unittest`?

Why no `pytest` / `unittest`?

Why no `pytest` / `unittest`?