How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Privacy Terms

Day 23 · ~12 min●

Prompts evolve. v1 worked. v2 fixed a bug. v3 added a constraint. When the eval suite (week 2) shows a regression, the question is which version produced the regression. Without versioning, you don't know.

python

PROMPTS = {
    "classify_v1": "Classify: {input}",
    "classify_v2": "Classify {input} as positive or negative. Reply with one word.",
    "classify_v3": "Classify {input} as exactly one of: positive, negative, neutral. Reply with one word, lowercased.",
}

ACTIVE_VERSION = "classify_v3"

def classify(text):
    template = PROMPTS[ACTIVE_VERSION]
    prompt = template.format(input=text)
    answer = Agent(model).run_sync(prompt).output.strip().lower()
    return {"version": ACTIVE_VERSION, "answer": answer}

Every answer is tagged with the prompt version that produced it. When eval pass rate drops, you grep the logs by version.

Why a dict instead of separate functions?

A dict is a registry. You can iterate it for A/B testing (tomorrow's lesson), look up by version string at runtime, ship a new version by adding a key. Keeping prompts in code as strings under version keys is the simplest possible prompt-management system.

Production tools for this?

PromptLayer, Helicone, Langsmith, OpenAI's prompt-templates feature. All built on top of this same idea — a prompt has a version, an answer is tagged with the version that produced it, the eval suite groups results by version. You can ship a Promptfoo-grade workflow with 20 lines of Python.

Prompt versioning — prompts as code

The registry pattern

python

class PromptRegistry:
    def __init__(self):
        self.prompts = {}
        self.active = None

    def register(self, name, version, template):
        self.prompts[f"{name}_v{version}"] = template

    def use(self, name, version):
        self.active = f"{name}_v{version}"

    def render(self, **kwargs):
        return self.prompts[self.active].format(**kwargs), self.active

What versioning unlocks

Capability	Without versioning	With versioning
Reproduce a regression	"It worked yesterday"	Pin to v2, replay
A/B test prompts	Manual swap, hope	Run both, compare per-version pass rate (tomorrow)
Roll back	Edit code, ship, hope	`registry.use("classify", 2)`
Audit	What was the prompt 3 weeks ago?	Lookup `classify_v2`

Naming convention

{purpose}_v{N} — classify_v3, summarise_v2. Purpose first (so related versions cluster), version suffix (so sort order is intuitive).

Semantic versioning (v1.2.3) is overkill — prompts rarely have backward-compat concerns. Integer monotonic versions (v1, v2, v3) keep the system simple.

Storage layer

Where the registry lives:

In code (today's lesson) — simple, version-controlled with the rest of the repo, easy diffing
In a DB / config service — hot-swap without redeploy, easier multi-environment
In a managed service (PromptLayer / Langsmith) — full observability, A/B infrastructure, comparisons

Start in code. Migrate when team size or change frequency demands it.

Versioning the output schema too

When prompt v3 changes the output format (added a confidence field), downstream code must know which schema to expect:

python

result = {"version": "classify_v3", "answer": "positive", "confidence": 0.9}
result["version"]    # downstream uses this to dispatch parsing logic

The same version string drives prompt selection AND output handling.

Sign up to practice

Create a free account to get started. Paid plans unlock all tracks.

or

Day 23 · ~12 min●

Prompts evolve. v1 worked. v2 fixed a bug. v3 added a constraint. When the eval suite (week 2) shows a regression, the question is which version produced the regression. Without versioning, you don't know.

python

PROMPTS = {
    "classify_v1": "Classify: {input}",
    "classify_v2": "Classify {input} as positive or negative. Reply with one word.",
    "classify_v3": "Classify {input} as exactly one of: positive, negative, neutral. Reply with one word, lowercased.",
}

ACTIVE_VERSION = "classify_v3"

def classify(text):
    template = PROMPTS[ACTIVE_VERSION]
    prompt = template.format(input=text)
    answer = Agent(model).run_sync(prompt).output.strip().lower()
    return {"version": ACTIVE_VERSION, "answer": answer}

Every answer is tagged with the prompt version that produced it. When eval pass rate drops, you grep the logs by version.

Why a dict instead of separate functions?

A dict is a registry. You can iterate it for A/B testing (tomorrow's lesson), look up by version string at runtime, ship a new version by adding a key. Keeping prompts in code as strings under version keys is the simplest possible prompt-management system.

Production tools for this?

PromptLayer, Helicone, Langsmith, OpenAI's prompt-templates feature. All built on top of this same idea — a prompt has a version, an answer is tagged with the version that produced it, the eval suite groups results by version. You can ship a Promptfoo-grade workflow with 20 lines of Python.

Prompt versioning — prompts as code

The registry pattern

python

class PromptRegistry:
    def __init__(self):
        self.prompts = {}
        self.active = None

    def register(self, name, version, template):
        self.prompts[f"{name}_v{version}"] = template

    def use(self, name, version):
        self.active = f"{name}_v{version}"

    def render(self, **kwargs):
        return self.prompts[self.active].format(**kwargs), self.active

What versioning unlocks

Capability	Without versioning	With versioning
Reproduce a regression	"It worked yesterday"	Pin to v2, replay
A/B test prompts	Manual swap, hope	Run both, compare per-version pass rate (tomorrow)
Roll back	Edit code, ship, hope	`registry.use("classify", 2)`
Audit	What was the prompt 3 weeks ago?	Lookup `classify_v2`

Naming convention

{purpose}_v{N} — classify_v3, summarise_v2. Purpose first (so related versions cluster), version suffix (so sort order is intuitive).

Semantic versioning (v1.2.3) is overkill — prompts rarely have backward-compat concerns. Integer monotonic versions (v1, v2, v3) keep the system simple.

Storage layer

Where the registry lives:

In code (today's lesson) — simple, version-controlled with the rest of the repo, easy diffing
In a DB / config service — hot-swap without redeploy, easier multi-environment
In a managed service (PromptLayer / Langsmith) — full observability, A/B infrastructure, comparisons

Start in code. Migrate when team size or change frequency demands it.

Versioning the output schema too

When prompt v3 changes the output format (added a confidence field), downstream code must know which schema to expect:

python

result = {"version": "classify_v3", "answer": "positive", "confidence": 0.9}
result["version"]    # downstream uses this to dispatch parsing logic

The same version string drives prompt selection AND output handling.

Sign up to practice

Create a free account to get started. Paid plans unlock all tracks.

or