How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Self-consistency — Ai Foundations

Day 26 · ~12 min●

Sampling means the same prompt can return different answers (week 1, day 2). For most tasks one call is fine. But for high-stakes classifications, you can call N times and vote.

The pattern: call N times, count occurrences, return the most common:

python

from collections import Counter
from pydantic_ai import Agent

def classify_majority(text, n=5):
    prompt = f'Classify: "{text}". Reply with one word: positive or negative.'
    answers = []
    for _ in range(n):
        out = Agent(model).run_sync(prompt).output.strip().strip(".").lower()
        if out in {"positive", "negative"}:
            answers.append(out)
    counter = Counter(answers)
    most_common = counter.most_common(1)[0][0]
    return most_common, dict(counter)

label, votes = classify_majority("It's fine, I guess.", n=5)
print(f"final: {label}    votes: {votes}")

5 calls = 5 quota slots. Worth it?

Only when the cost of wrong is high. Routing 1000 customer support tickets and a wrong label costs you a misrouted ticket — probably one call each is fine. Diagnosing a medical symptom from notes — even 5x cost is cheap insurance against drift. Self-consistency is a judgment call about cost vs reliability.

Does temperature affect this?

Yes — slightly higher temperature (e.g., 0.7) gives more sampling variation, so disagreement actually surfaces. With temperature 0 (deterministic), all 5 calls return the same answer and self-consistency is wasted. The pattern shines when there's genuine uncertainty.

Self-consistency

The idea: when one LLM call has uncertainty, N calls and a majority vote often beat one call.

The vote

python

from collections import Counter

answers = [call(prompt) for _ in range(N)]
final = Counter(answers).most_common(1)[0][0]

Counter(...) counts occurrences. .most_common(1) returns the most-frequent element. Tie-breaks go to insertion order.

When self-consistency helps

Closed-set classification with genuine uncertainty. "Is this email spam?" — 5 calls, vote.
Math / reasoning where models drift. "What is 7 × 13?" — 5 calls, vote.
Multi-choice answers where you trust the vote more than any single response.

When it doesn't help

Open-ended generation. "Summarise this article" — 5 different summaries don't combine cleanly. Take the first one or judge them yourself.
Deterministic tasks. "Convert this date to ISO format" — one call works; 5 calls just costs 5×.
When all 5 calls return the same answer. If the model is confident, self-consistency adds cost without insight.

Cost model

Budget for self-consistency:

total cost ≈ N × (one-call cost)

With N=5, you're spending 5× per item. For a 1000-item batch that's 5000 calls. Compare against:

single call: 1000 calls — but ~5% wrong
self-consistency at 5: 5000 calls — but ~0.5% wrong
single call + spot check: 1000 + ~50 manual checks — cheap and good for non-critical batches

Reach for self-consistency only when the cost of wrong is much higher than the cost of 5×.

Composes with other patterns

Helpers — classify_majority is a helper that wraps the vote. Every call site benefits.
Retry on bad output — for each of the N calls, if the output is invalid, retry once.
Cost tracking — sum tokens across the N calls (next lesson).

Day 26 · ~12 min●

Sampling means the same prompt can return different answers (week 1, day 2). For most tasks one call is fine. But for high-stakes classifications, you can call N times and vote.

The pattern: call N times, count occurrences, return the most common:

python

from collections import Counter
from pydantic_ai import Agent

def classify_majority(text, n=5):
    prompt = f'Classify: "{text}". Reply with one word: positive or negative.'
    answers = []
    for _ in range(n):
        out = Agent(model).run_sync(prompt).output.strip().strip(".").lower()
        if out in {"positive", "negative"}:
            answers.append(out)
    counter = Counter(answers)
    most_common = counter.most_common(1)[0][0]
    return most_common, dict(counter)

label, votes = classify_majority("It's fine, I guess.", n=5)
print(f"final: {label}    votes: {votes}")

5 calls = 5 quota slots. Worth it?

Does temperature affect this?

Self-consistency

The idea: when one LLM call has uncertainty, N calls and a majority vote often beat one call.

The vote

python

from collections import Counter

answers = [call(prompt) for _ in range(N)]
final = Counter(answers).most_common(1)[0][0]

Counter(...) counts occurrences. .most_common(1) returns the most-frequent element. Tie-breaks go to insertion order.

When self-consistency helps

Closed-set classification with genuine uncertainty. "Is this email spam?" — 5 calls, vote.
Math / reasoning where models drift. "What is 7 × 13?" — 5 calls, vote.
Multi-choice answers where you trust the vote more than any single response.

When it doesn't help

Open-ended generation. "Summarise this article" — 5 different summaries don't combine cleanly. Take the first one or judge them yourself.
Deterministic tasks. "Convert this date to ISO format" — one call works; 5 calls just costs 5×.
When all 5 calls return the same answer. If the model is confident, self-consistency adds cost without insight.

Cost model

Budget for self-consistency:

total cost ≈ N × (one-call cost)

With N=5, you're spending 5× per item. For a 1000-item batch that's 5000 calls. Compare against:

single call: 1000 calls — but ~5% wrong
self-consistency at 5: 5000 calls — but ~0.5% wrong
single call + spot check: 1000 + ~50 manual checks — cheap and good for non-critical batches

Reach for self-consistency only when the cost of wrong is much higher than the cost of 5×.

Composes with other patterns

Helpers — classify_majority is a helper that wraps the vote. Every call site benefits.
Retry on bad output — for each of the N calls, if the output is invalid, retry once.
Cost tracking — sum tokens across the N calls (next lesson).

Self-consistency

The vote

When self-consistency helps

When it doesn't help

Cost model

Composes with other patterns

Self-consistency

The vote

When self-consistency helps

When it doesn't help

Cost model

Composes with other patterns

Self-consistency

The vote

When self-consistency helps

When it doesn't help

Cost model

Composes with other patterns

Sign up to practice

Self-consistency

The vote

When self-consistency helps

When it doesn't help

Cost model

Composes with other patterns

Sign up to practice