How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Final synthesis — Ai Patterns

Day 28 · ~12 min●

Last code lesson. Compose six primitives on a tiny generic problem so the orchestration is what shines, not the domain.

python

from pydantic_ai import Agent
from pydantic import BaseModel
import re

# === The setup ===

VALUES = {"a": 5, "b": 7, "c": 11}

agent = Agent(model)

@agent.tool_plain
def lookup(key: str) -> int:
    """Look up the integer value associated with a single-letter key."""
    return VALUES[key]

@agent.tool_plain
def add(x: int, y: int) -> int:
    """Return the sum of two integers."""
    return x + y

# === Eval suite ===

cases = [
    ("What is a + b?", 12),
    ("What is c + a?", 16),
    ("What is b + c?", 18),
]

def parse_int(s):
    digits = re.findall(r"-?\d+", s)
    return int(digits[-1]) if digits else None

# === Run + score ===

rubric_per_case = []
for prompt, expected in cases:
    out = agent.run_sync(prompt).output
    got = parse_int(out)
    rubric_per_case.append({
        "correct":     got == expected,
        "is_integer":  got is not None,
        "non_empty":   bool(out.strip()),
    })

# Weighted rubric: correctness weighs more
WEIGHTS = {"correct": 0.7, "is_integer": 0.2, "non_empty": 0.1}

scores = []
for item in rubric_per_case:
    s = sum(WEIGHTS[k] * int(v) for k, v in item.items())
    scores.append(s)

final = sum(scores) / len(scores)

for i, (item, s) in enumerate(zip(rubric_per_case, scores), 1):
    print(f"  case {i}: {item}, score={s:.2f}")
print(f"\nFinal weighted score: {final:.2f}")

Tools (lookup, add), agent loop (3-case eval), output validation (regex-parse), rubric (weighted scoring), threshold (final score check). Each piece is doing one job.

Right. Six primitives, three tiny generic cases. The orchestration — calling the agent on each case, scoring each output across a rubric, averaging — is the lesson. The math itself is incidental.

Why not test on something real?

Because the pattern is the point. Once you've internalised the orchestration on toy math, you can plug in any tool (Composio actions from L27), any rubric (domain-specific predicates), any cases (your real eval set) and the structure is the same. Synthesis lessons stay tiny on purpose.

Final synthesis — what's composed

Primitive	From	Role here
Multi-tool agent	L18	`lookup` + `add` registered, model picks per case
Tool calling	L4	Each tool is a typed Python function with docstring
Output validation	L11/L24	`parse_int` regex-extracts the integer from agent text
Eval suite	L19	3 (input, expected) cases
Scoring rubric	L25	3-criterion weighted score per case
Threshold check	L19/L21	Final mean score >= 0.7

Six primitives. Twenty-five lines (excluding the toy data). Generic math. No domain smuggling.

What this is testing — for you

Can you compose without copy-pasting from prior lessons? (Yes, structurally.)
Can you reason about cost? (3 agent runs * 2-3 LLM calls each = ~9 quota slots.)
Can you read the failure when one case mis-matches? (rubric_per_case dicts show which criterion failed for which case.)
Can you swap pieces without breaking the rest? (Replace lookup/add with two Composio tools — the rest of the script stays the same.)

What's deliberately NOT here

Self-critique (L12) — would add 2-3 calls per case, blow up cost
Moderation (L13) — no untrusted input
Multi-step planning chain (L26) — agent loop already gives us multi-tool composition
Typed pydantic output (L23) — we're parsing text deliberately to keep the pattern visible

Synthesis isn't "use everything". It's "use what fits". A real production agent would pick a different subset for its problem.

What to do after this lesson

You have the AI Patterns kit. To apply it:

Pick a real eval set — 5-10 (input, expected) pairs from your actual use case
Write small narrow tools — one purpose each, narrow signatures
Pick your failure mode metric — what "good enough" means, in numbers
Iterate — change one thing, re-run, keep what helped

The exercise of building a real eval suite for your real problem is the move from "finished an LLM track" to "can ship LLM features". AI Advanced (deferred) adds embeddings, RAG, model routing, caching — refinements on top of this kit, not replacements.

Today

The code above. Run it. Verification asserts the final mean weighted score >= 0.7 — meaning at least 70% of the criteria, weighted by importance, passed across the 3 cases.

Day 28 · ~12 min●

Last code lesson. Compose six primitives on a tiny generic problem so the orchestration is what shines, not the domain.

python

from pydantic_ai import Agent
from pydantic import BaseModel
import re

# === The setup ===

VALUES = {"a": 5, "b": 7, "c": 11}

agent = Agent(model)

@agent.tool_plain
def lookup(key: str) -> int:
    """Look up the integer value associated with a single-letter key."""
    return VALUES[key]

@agent.tool_plain
def add(x: int, y: int) -> int:
    """Return the sum of two integers."""
    return x + y

# === Eval suite ===

cases = [
    ("What is a + b?", 12),
    ("What is c + a?", 16),
    ("What is b + c?", 18),
]

def parse_int(s):
    digits = re.findall(r"-?\d+", s)
    return int(digits[-1]) if digits else None

# === Run + score ===

rubric_per_case = []
for prompt, expected in cases:
    out = agent.run_sync(prompt).output
    got = parse_int(out)
    rubric_per_case.append({
        "correct":     got == expected,
        "is_integer":  got is not None,
        "non_empty":   bool(out.strip()),
    })

# Weighted rubric: correctness weighs more
WEIGHTS = {"correct": 0.7, "is_integer": 0.2, "non_empty": 0.1}

scores = []
for item in rubric_per_case:
    s = sum(WEIGHTS[k] * int(v) for k, v in item.items())
    scores.append(s)

final = sum(scores) / len(scores)

for i, (item, s) in enumerate(zip(rubric_per_case, scores), 1):
    print(f"  case {i}: {item}, score={s:.2f}")
print(f"\nFinal weighted score: {final:.2f}")

Tools (lookup, add), agent loop (3-case eval), output validation (regex-parse), rubric (weighted scoring), threshold (final score check). Each piece is doing one job.

Right. Six primitives, three tiny generic cases. The orchestration — calling the agent on each case, scoring each output across a rubric, averaging — is the lesson. The math itself is incidental.

Why not test on something real?

Final synthesis — what's composed

Primitive	From	Role here
Multi-tool agent	L18	`lookup` + `add` registered, model picks per case
Tool calling	L4	Each tool is a typed Python function with docstring
Output validation	L11/L24	`parse_int` regex-extracts the integer from agent text
Eval suite	L19	3 (input, expected) cases
Scoring rubric	L25	3-criterion weighted score per case
Threshold check	L19/L21	Final mean score >= 0.7

Six primitives. Twenty-five lines (excluding the toy data). Generic math. No domain smuggling.

What this is testing — for you

Can you compose without copy-pasting from prior lessons? (Yes, structurally.)
Can you reason about cost? (3 agent runs * 2-3 LLM calls each = ~9 quota slots.)
Can you read the failure when one case mis-matches? (rubric_per_case dicts show which criterion failed for which case.)
Can you swap pieces without breaking the rest? (Replace lookup/add with two Composio tools — the rest of the script stays the same.)

What's deliberately NOT here

Self-critique (L12) — would add 2-3 calls per case, blow up cost
Moderation (L13) — no untrusted input
Multi-step planning chain (L26) — agent loop already gives us multi-tool composition
Typed pydantic output (L23) — we're parsing text deliberately to keep the pattern visible

Synthesis isn't "use everything". It's "use what fits". A real production agent would pick a different subset for its problem.

What to do after this lesson

You have the AI Patterns kit. To apply it:

Pick a real eval set — 5-10 (input, expected) pairs from your actual use case
Write small narrow tools — one purpose each, narrow signatures
Pick your failure mode metric — what "good enough" means, in numbers
Iterate — change one thing, re-run, keep what helped

Today

The code above. Run it. Verification asserts the final mean weighted score >= 0.7 — meaning at least 70% of the criteria, weighted by importance, passed across the 3 cases.

Final synthesis — what's composed

What this is testing — for you

What's deliberately NOT here

What to do after this lesson

Today

Final synthesis — what's composed

What this is testing — for you

What's deliberately NOT here

What to do after this lesson

Today

Final synthesis — what's composed

What this is testing — for you

What's deliberately NOT here

What to do after this lesson

Today

Sign up to practice

Final synthesis — what's composed

What this is testing — for you

What's deliberately NOT here

What to do after this lesson

Today

Sign up to practice