How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Chunking — Ai Mastery

Day 4 · ~11 min●

Embedding a 10,000-word document into a single vector loses everything. You squash all the meaning into one direction. Chunking splits the text first, embeds each chunk separately, and lets you retrieve the part that's relevant.

Two common strategies:

python

# 1. Sentence-based — preserves natural boundaries
import re
sentences = re.split(r'(?<=[.!?])\s+', text)

# 2. Fixed-window — predictable size, may split mid-sentence
def window_chunk(text, size=200, overlap=50):
    chunks = []
    i = 0
    while i < len(text):
        chunks.append(text[i:i+size])
        i += size - overlap
    return chunks

When do I want which?

Sentence-based for prose where sentence boundaries are meaningful. Fixed-window with overlap for unstructured logs / transcripts where you can't trust the punctuation. Most production RAG pipelines use fixed-window with ~10–20% overlap so a sentence cut in half still appears whole in one chunk.

And the chunk size?

Tradeoff. Smaller chunks = more precise retrieval, but might lack surrounding context. Larger chunks = more context, but more tokens shipped to the model and noisier matches. 200–500 chars is a common starting point. Tune against your eval suite (week 2).

Chunking — split before embed

Production RAG ingests source text by chunking it into 200–800 char pieces, embedding each, and storing the (chunk, vector) pair.

Why chunk

Granularity — retrieve the paragraph that answers the question, not the entire document
Token economy — only the relevant chunks ship to the LLM, not the whole corpus
Embedding quality — embedders give cleaner signal on focused passages than on long mixed-topic blobs

Sentence chunking

python

import re
sentences = re.split(r'(?<=[.!?])\s+', text)
sentences = [s.strip() for s in sentences if s.strip()]

Boundaries align with meaning. Each chunk = one sentence. Predictable for prose, fragile for messy text.

Fixed-window chunking with overlap

python

def window_chunk(text, size=200, overlap=50):
    chunks = []
    i = 0
    while i < len(text):
        chunks.append(text[i:i+size])
        i += size - overlap
    return chunks

Every chunk is exactly size chars (last one may be shorter). Overlap means a sentence cut at position 200 reappears whole in the next chunk starting at position 150. Default in production pipelines.

Tradeoffs

Strategy	Precision	Context	Best for
Sentence	High	Low	Prose with clean punctuation
Fixed-window	Medium	Medium	Logs, transcripts, mixed text
Recursive (paragraph → sentence → token)	High	High	Structured docs, advanced

We stay with sentence + fixed-window in this lesson. Recursive chunkers (LangChain's RecursiveCharacterTextSplitter) are the production-grade extension — same idea, more configurable.

Day 4 · ~11 min●

Two common strategies:

python

# 1. Sentence-based — preserves natural boundaries
import re
sentences = re.split(r'(?<=[.!?])\s+', text)

# 2. Fixed-window — predictable size, may split mid-sentence
def window_chunk(text, size=200, overlap=50):
    chunks = []
    i = 0
    while i < len(text):
        chunks.append(text[i:i+size])
        i += size - overlap
    return chunks

When do I want which?

And the chunk size?

Chunking — split before embed

Production RAG ingests source text by chunking it into 200–800 char pieces, embedding each, and storing the (chunk, vector) pair.

Why chunk

Granularity — retrieve the paragraph that answers the question, not the entire document
Token economy — only the relevant chunks ship to the LLM, not the whole corpus
Embedding quality — embedders give cleaner signal on focused passages than on long mixed-topic blobs

Sentence chunking

python

import re
sentences = re.split(r'(?<=[.!?])\s+', text)
sentences = [s.strip() for s in sentences if s.strip()]

Boundaries align with meaning. Each chunk = one sentence. Predictable for prose, fragile for messy text.

Fixed-window chunking with overlap

python

def window_chunk(text, size=200, overlap=50):
    chunks = []
    i = 0
    while i < len(text):
        chunks.append(text[i:i+size])
        i += size - overlap
    return chunks

Tradeoffs

Strategy	Precision	Context	Best for
Sentence	High	Low	Prose with clean punctuation
Fixed-window	Medium	Medium	Logs, transcripts, mixed text
Recursive (paragraph → sentence → token)	High	High	Structured docs, advanced

We stay with sentence + fixed-window in this lesson. Recursive chunkers (LangChain's RecursiveCharacterTextSplitter) are the production-grade extension — same idea, more configurable.

Chunking — split before embed

Why chunk

Sentence chunking

Fixed-window chunking with overlap

Tradeoffs

Chunking — split before embed

Why chunk

Sentence chunking

Fixed-window chunking with overlap

Tradeoffs

Chunking — split before embed

Why chunk

Sentence chunking

Fixed-window chunking with overlap

Tradeoffs

Sign up to practice

Chunking — split before embed

Why chunk

Sentence chunking

Fixed-window chunking with overlap

Tradeoffs

Sign up to practice