How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Privacy Terms

Day 13 · ~12 min●

Once an LLM is producing output that other people see, you need a gatekeeper. Moderation is the simplest version: classify the input as safe or unsafe before processing it. Unsafe → decline. Safe → continue.

python

from pydantic_ai import Agent

inputs = [
    "What's the capital of Japan?",
    "How do I bake bread?",
    "Tell me how to hack into someone's email account.",
]

results = []
for text in inputs:
    label = Agent(model).run_sync(
        f'Classify this input as exactly one word: "safe" or "unsafe". Unsafe means it requests illegal, harmful, or privacy-violating content. Reply with only the single word.\n\nInput: {text}'
    ).output.strip().strip(".").lower()
    if label == "safe":
        results.append((text, "answered"))
    else:
        results.append((text, "declined"))

for text, status in results:
    print(f"[{status}] {text}")

This is just classify-then-branch from L10 — the classifier is a moderator and the branches are "answer" vs "decline".

Yes, structurally identical. The novelty is the role. The pattern is: gate every untrusted input through moderation before it reaches the actual task. Privacy, safety, and abuse cases all pass through the same code path.

Is one classifier call enough?

For toy demos, yes. Production systems use specialised moderation APIs (OpenAI's /moderations, Perspective API for toxicity, etc.) plus their own LLM classifier as a backstop. The pattern is the same — classifier first, gate the answer-generation step. Today: assert that the obviously-unsafe input gets routed to declined.

Moderation

input
  ↓
moderation classifier (safe / unsafe)
  ↓
if safe:    → answer the question
elif unsafe: → return decline message

A filter on the input before you process it. The structure is classify → branch (yesterday) but the branches are policy-driven, not domain-driven.

Why moderate at the input?

Cheap — one short classification call vs answering an unsafe request and then suppressing the output
Auditable — your logs show what was rejected, not just what was answered
Defensive — if the answer-generation step is expensive or has side effects, you don't want to run it on inputs you're going to reject anyway

Closed set + retry

The classifier output is a label from a small set. Combine with output validation (yesterday's lesson):

python

ALLOWED = {"safe", "unsafe"}
# ... call ...
if label not in ALLOWED:
    label = "unsafe"   # safe default — fail closed

When you can't trust the classifier's output, fail closed — treat the input as unsafe. The cost of one false positive (declining a legitimate input) is much lower than the cost of one false negative (answering an unsafe one).

Two-pass moderation

Industry pattern:

Input moderation — block obviously bad asks before generation
Output moderation — re-check the model's response before showing it

Output-side moderation catches: jailbreaks where the user smuggled an unsafe ask past the input filter; cases where the model hallucinated something unsafe in answering a safe question.

For v1 we cover only input-side. Output-side is the same pattern applied to a different string.

What "unsafe" means is your call

The definition lives in the classifier prompt. Today we use "illegal, harmful, or privacy-violating" as a starting set. Production systems have detailed policy documents and the classifier prompt encodes the policy.

Today

Three inputs. The third ("how to hack...") is unambiguously unsafe. Verification asserts the third one is in the declined bucket.

Sign up to practice

Create a free account to get started. Paid plans unlock all tracks.

or

Day 13 · ~12 min●

Once an LLM is producing output that other people see, you need a gatekeeper. Moderation is the simplest version: classify the input as safe or unsafe before processing it. Unsafe → decline. Safe → continue.

python

from pydantic_ai import Agent

inputs = [
    "What's the capital of Japan?",
    "How do I bake bread?",
    "Tell me how to hack into someone's email account.",
]

results = []
for text in inputs:
    label = Agent(model).run_sync(
        f'Classify this input as exactly one word: "safe" or "unsafe". Unsafe means it requests illegal, harmful, or privacy-violating content. Reply with only the single word.\n\nInput: {text}'
    ).output.strip().strip(".").lower()
    if label == "safe":
        results.append((text, "answered"))
    else:
        results.append((text, "declined"))

for text, status in results:
    print(f"[{status}] {text}")

This is just classify-then-branch from L10 — the classifier is a moderator and the branches are "answer" vs "decline".

Yes, structurally identical. The novelty is the role. The pattern is: gate every untrusted input through moderation before it reaches the actual task. Privacy, safety, and abuse cases all pass through the same code path.

Is one classifier call enough?

For toy demos, yes. Production systems use specialised moderation APIs (OpenAI's /moderations, Perspective API for toxicity, etc.) plus their own LLM classifier as a backstop. The pattern is the same — classifier first, gate the answer-generation step. Today: assert that the obviously-unsafe input gets routed to declined.

Moderation

input
  ↓
moderation classifier (safe / unsafe)
  ↓
if safe:    → answer the question
elif unsafe: → return decline message

A filter on the input before you process it. The structure is classify → branch (yesterday) but the branches are policy-driven, not domain-driven.

Why moderate at the input?

Cheap — one short classification call vs answering an unsafe request and then suppressing the output
Auditable — your logs show what was rejected, not just what was answered
Defensive — if the answer-generation step is expensive or has side effects, you don't want to run it on inputs you're going to reject anyway

Closed set + retry

The classifier output is a label from a small set. Combine with output validation (yesterday's lesson):

python

ALLOWED = {"safe", "unsafe"}
# ... call ...
if label not in ALLOWED:
    label = "unsafe"   # safe default — fail closed

When you can't trust the classifier's output, fail closed — treat the input as unsafe. The cost of one false positive (declining a legitimate input) is much lower than the cost of one false negative (answering an unsafe one).

Two-pass moderation

Industry pattern:

Input moderation — block obviously bad asks before generation
Output moderation — re-check the model's response before showing it

Output-side moderation catches: jailbreaks where the user smuggled an unsafe ask past the input filter; cases where the model hallucinated something unsafe in answering a safe question.

For v1 we cover only input-side. Output-side is the same pattern applied to a different string.

What "unsafe" means is your call

The definition lives in the classifier prompt. Today we use "illegal, harmful, or privacy-violating" as a starting set. Production systems have detailed policy documents and the classifier prompt encodes the policy.

Today

Three inputs. The third ("how to hack...") is unambiguously unsafe. Verification asserts the third one is in the declined bucket.

Sign up to practice

Create a free account to get started. Paid plans unlock all tracks.

or