How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Storing embeddings — Ai Mastery

Day 5 · ~11 min●

Embedded chunks need to live somewhere. The simplest store is a Python dict — {chunk_id: vector} — backed by a JSON file for persistence. Production switches to a vector database (pgvector, Pinecone, Weaviate); the interface is the same.

Round-trip pattern:

python

import json

store = {}
for i, chunk in enumerate(chunks):
    store[f"chunk-{i}"] = embed(chunk)

with open("vectors.json", "w") as f:
    json.dump(store, f)

# Later, in another process
with open("vectors.json") as f:
    reloaded = json.load(f)

assert reloaded == store    # exact equality round-trips through JSON

That's it? No special database?

For a 5-chunk corpus, a dict is faster than a database call. The threshold to switch is roughly 10,000 vectors — past that, linear scan over the dict gets slow. Below that, JSON-on-disk is fine.

And the chunk text itself?

Store it alongside. Common pattern: {chunk_id: {"text": ..., "vector": [...]}}. When you retrieve top-k by cosine, you also need the text to feed back into the LLM.

Storing embeddings — dict + JSON

python

import json

def build_store(chunks, embed_fn):
    return {f"chunk-{i}": {"text": c, "vector": embed_fn(c)} for i, c in enumerate(chunks)}

def save_store(store, path):
    with open(path, "w") as f:
        json.dump(store, f)

def load_store(path):
    with open(path) as f:
        return json.load(f)

Three functions, three responsibilities. Build, persist, restore.

Why JSON

Human-readable — easy to inspect when debugging
No dependencies — stdlib only
Survives serialisation — lists of floats round-trip exactly

For querying-at-scale you'd graduate to SQLite-with-vectors or a vector DB. We stay with JSON because the concept is what matters — store the (id, text, vector) trio so semantic search has something to search over.

The round-trip assertion

python

store = build_store(chunks, embed)
save_store(store, "vectors.json")
reloaded = load_store("vectors.json")
assert reloaded == store

If this assertion fails, your serialisation is silently corrupting data. Catch it once and you'll trust the rest of the pipeline. (Common cause: storing numpy arrays — json.dump chokes. Convert to plain lists first.)

Sandbox is ephemeral

Reminder from earlier weeks: the sandbox filesystem is wiped between runs. JSON-to-disk works within a single run; cross-run persistence needs a real database (or sheets, or notion). For lessons we keep persistence in-process.

Day 5 · ~11 min●

Round-trip pattern:

python

import json

store = {}
for i, chunk in enumerate(chunks):
    store[f"chunk-{i}"] = embed(chunk)

with open("vectors.json", "w") as f:
    json.dump(store, f)

# Later, in another process
with open("vectors.json") as f:
    reloaded = json.load(f)

assert reloaded == store    # exact equality round-trips through JSON

That's it? No special database?

For a 5-chunk corpus, a dict is faster than a database call. The threshold to switch is roughly 10,000 vectors — past that, linear scan over the dict gets slow. Below that, JSON-on-disk is fine.

And the chunk text itself?

Store it alongside. Common pattern: {chunk_id: {"text": ..., "vector": [...]}}. When you retrieve top-k by cosine, you also need the text to feed back into the LLM.

Storing embeddings — dict + JSON

python

import json

def build_store(chunks, embed_fn):
    return {f"chunk-{i}": {"text": c, "vector": embed_fn(c)} for i, c in enumerate(chunks)}

def save_store(store, path):
    with open(path, "w") as f:
        json.dump(store, f)

def load_store(path):
    with open(path) as f:
        return json.load(f)

Three functions, three responsibilities. Build, persist, restore.

Why JSON

Human-readable — easy to inspect when debugging
No dependencies — stdlib only
Survives serialisation — lists of floats round-trip exactly

The round-trip assertion

python

store = build_store(chunks, embed)
save_store(store, "vectors.json")
reloaded = load_store("vectors.json")
assert reloaded == store

Storing embeddings — dict + JSON

Why JSON

The round-trip assertion

Sandbox is ephemeral

Storing embeddings — dict + JSON

Why JSON

The round-trip assertion

Sandbox is ephemeral

Storing embeddings — dict + JSON

Why JSON

The round-trip assertion

Sandbox is ephemeral

Sign up to practice

Storing embeddings — dict + JSON

Why JSON

The round-trip assertion

Sandbox is ephemeral

Sign up to practice