How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Scoring rubrics — Ai Patterns

Day 25 · ~12 min●

Pass/fail is binary. A rubric assigns weights — some criteria matter more than others — and produces a numeric score.

python

from pydantic_ai import Agent

prompt = 'Write a one-sentence definition of "recursion" for a programming beginner. Mention the word "itself".'
output = Agent(model).run_sync(prompt).output.strip()

rubric = [
    # (name, weight, predicate)
    ("length_ok",       0.2, 8 <= len(output.split()) <= 30),
    ("ends_period",     0.1, output.endswith(".")),
    ("mentions_itself", 0.4, "itself" in output.lower()),
    ("single_sentence", 0.3, output.count(".") <= 1 and output.count("!") == 0 and output.count("?") == 0),
]

total_weight = sum(w for _, w, _ in rubric)
score = sum(w * int(p) for _, w, p in rubric) / total_weight

for name, weight, passed in rubric:
    mark = "PASS" if passed else "FAIL"
    print(f"  {mark} (weight {weight:.1f}) — {name}")

print(f"\nWeighted score: {score:.2f}")

Each criterion has a weight. The score is a weighted average of pass/fail. Higher-weight criteria contribute more.

Right. "Mentions itself" is the most important property here (weight 0.4); length and punctuation matter less. The total weight sums to 1.0, so the score is in [0, 1]. Easier to reason about than "3 out of 5" when criteria differ in importance.

When does this beat the simple checklist from L13?

When the criteria aren't equal. For "is the output safe?" you might have one critical criterion (no PII) at weight 0.9 and three nice-to-haves at weight 0.033 each. A flat checklist doesn't capture that priority. A weighted rubric does — and the score signals "close to acceptable" vs "way off" instead of just binary fail.

Scoring rubrics

python

rubric = [
    (name_1, weight_1, predicate_1),
    (name_2, weight_2, predicate_2),
    ...
]

score = sum(w * passed for _, w, passed in rubric) / sum(w for _, w, _ in rubric)

A list of (name, weight, bool) tuples. The weighted average is your score in [0, 1].

Why weights matter

Use case	Why weights help
Safety-critical — must-haves vs nice-to-haves	Heavy weight on safety criteria; minor weight on style
Iteration — track which criteria are improving	Weighted score captures whether high-priority items are getting better
Comparison across prompt versions	Numeric score is comparable; pass/fail is too coarse
Threshold setting — "acceptable" depends on which criteria pass	A 0.9 score with safety criteria passing is very different from 0.9 with safety failing

Anatomy of a useful rubric

3-7 criteria. More is hard to maintain; fewer doesn't differentiate.
Weights are explicit policy. Document why each weight is what it is.
Predicates are deterministic Python — same as week 2 eval criteria.
Total weight sums to 1.0 by convention (makes scores comparable across rubrics).

When NOT to use a weighted rubric

Single dominant criterion — just use the predicate, no need for ceremony
Outputs that are essentially binary (correct answer vs wrong answer) — use exact match
Generative outputs where any deterministic check is approximate — be honest about the limitation

What rubrics aren't

Rubrics are not LLM judges. We're not asking the LLM "score this output 1-10" — that's drift-prone and expensive. We're using deterministic Python predicates with weights to shape the aggregate score. Reproducible. Free.

Today

One LLM-generated definition of "recursion". Four criteria with weights summing to 1.0. Compute the weighted score. Bind it to score. Verification asserts score >= 0.5 (the prompt asks for the keyword, so it should pass mentions_itself reliably).

Day 25 · ~12 min●

Pass/fail is binary. A rubric assigns weights — some criteria matter more than others — and produces a numeric score.

python

from pydantic_ai import Agent

prompt = 'Write a one-sentence definition of "recursion" for a programming beginner. Mention the word "itself".'
output = Agent(model).run_sync(prompt).output.strip()

rubric = [
    # (name, weight, predicate)
    ("length_ok",       0.2, 8 <= len(output.split()) <= 30),
    ("ends_period",     0.1, output.endswith(".")),
    ("mentions_itself", 0.4, "itself" in output.lower()),
    ("single_sentence", 0.3, output.count(".") <= 1 and output.count("!") == 0 and output.count("?") == 0),
]

total_weight = sum(w for _, w, _ in rubric)
score = sum(w * int(p) for _, w, p in rubric) / total_weight

for name, weight, passed in rubric:
    mark = "PASS" if passed else "FAIL"
    print(f"  {mark} (weight {weight:.1f}) — {name}")

print(f"\nWeighted score: {score:.2f}")

Each criterion has a weight. The score is a weighted average of pass/fail. Higher-weight criteria contribute more.

When does this beat the simple checklist from L13?

Scoring rubrics

python

rubric = [
    (name_1, weight_1, predicate_1),
    (name_2, weight_2, predicate_2),
    ...
]

score = sum(w * passed for _, w, passed in rubric) / sum(w for _, w, _ in rubric)

A list of (name, weight, bool) tuples. The weighted average is your score in [0, 1].

Why weights matter

Use case	Why weights help
Safety-critical — must-haves vs nice-to-haves	Heavy weight on safety criteria; minor weight on style
Iteration — track which criteria are improving	Weighted score captures whether high-priority items are getting better
Comparison across prompt versions	Numeric score is comparable; pass/fail is too coarse
Threshold setting — "acceptable" depends on which criteria pass	A 0.9 score with safety criteria passing is very different from 0.9 with safety failing

Anatomy of a useful rubric

3-7 criteria. More is hard to maintain; fewer doesn't differentiate.
Weights are explicit policy. Document why each weight is what it is.
Predicates are deterministic Python — same as week 2 eval criteria.
Total weight sums to 1.0 by convention (makes scores comparable across rubrics).

When NOT to use a weighted rubric

Single dominant criterion — just use the predicate, no need for ceremony
Outputs that are essentially binary (correct answer vs wrong answer) — use exact match
Generative outputs where any deterministic check is approximate — be honest about the limitation

Scoring rubrics

Why weights matter

Anatomy of a useful rubric

When NOT to use a weighted rubric

What rubrics aren't

Today

Scoring rubrics

Why weights matter

Anatomy of a useful rubric

When NOT to use a weighted rubric

What rubrics aren't

Today

Scoring rubrics

Why weights matter

Anatomy of a useful rubric

When NOT to use a weighted rubric

What rubrics aren't

Today

Sign up to practice

Scoring rubrics

Why weights matter

Anatomy of a useful rubric

When NOT to use a weighted rubric

What rubrics aren't

Today

Sign up to practice