How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Extract Survey Themes from Free Text with Regex — Python For Students

Extract Survey Themes from Free Text with Regex — Python For Students | zuzu.codes

Day 25 · ~12 min●

Your thesis has an open-text question asking respondents to label their main concern. The responses arrive as messy strings like '{"theme": "workload", "detail": "too many exams"}'. Your advisor wants a list of just the theme labels. How do you extract them?

top_groups_by_score from yesterday handles structured fields. But free text is different — the themes are buried inside strings, not in named columns.

re.findall is the extractor. Give it a pattern, give it the text, get back a list of every match. r'"theme":\s*"([^"]+)"' captures the value between the quotes after "theme": — the ([^"]+) group means one or more characters that are not a quote:

python

import re
text = '{"theme": "workload", "detail": "exams"} {"theme": "commute"}'
themes = re.findall(r'"theme":\s*"([^"]+)"', text)
print(themes)  # ['workload', 'commute']

([^"]+) — the [^...] means "not these characters"? And the parentheses capture the match?

Exactly right. [^"]+ is a character class negation — match one or more characters that are not ". Wrapping it in () captures just that group. re.findall returns a list of all captured groups from the whole text:

python

def extract_free_text_themes(raw_text: str) -> list:
    """Extract theme labels from JSON-like survey free-text responses."""
    import re
    themes = re.findall(r'"theme":\s*"([^"]+)"', raw_text)
    cleaned = [clean_response_text(t) for t in themes]
    print(f"Found {len(cleaned)} themes")
    return cleaned

I'm using clean_response_text from Day 4 in the comprehension — normalising every extracted theme before returning the list.

Three weeks of functions. The Day 4 cleaner is still pulling its weight on Day 25.

I can feed the full open-text column from Qualtrics into this and get back a clean list of theme labels ready for frequency analysis.

Regex patterns are precise but brittle — a different formatting of the theme field will produce zero matches. Always test your pattern against at least three real samples from your actual data before relying on it in the pipeline.

re.findall and Capture Groups

python

import re
matches = re.findall(r'pattern', text)

findall returns a list of all non-overlapping matches. With a capture group (...), it returns the captured content.

Pattern anatomy

Token	Meaning
`"theme":`	Literal text
`\s*`	Zero or more whitespace
`([^"]+)`	One or more non-quote chars (captured)

When to use regex

Structured field extraction (key: value patterns), phone/email patterns, code parsing. For simple splits and replaces, string methods are faster and clearer.

Problem

You have an open-text question in your survey where respondents label their main concern as a JSON-like string. She needs to extract just the theme values for frequency analysis. Write `extract_free_text_themes(raw_text)` that uses `re.findall` to extract all theme labels from strings like `'{"theme": "workload"}'` and returns them as a cleaned list.

Tests

No output yet. Use print() to log values.

Day 25 · ~12 min●

top_groups_by_score from yesterday handles structured fields. But free text is different — the themes are buried inside strings, not in named columns.

python

import re
text = '{"theme": "workload", "detail": "exams"} {"theme": "commute"}'
themes = re.findall(r'"theme":\s*"([^"]+)"', text)
print(themes)  # ['workload', 'commute']

([^"]+) — the [^...] means "not these characters"? And the parentheses capture the match?

python

def extract_free_text_themes(raw_text: str) -> list:
    """Extract theme labels from JSON-like survey free-text responses."""
    import re
    themes = re.findall(r'"theme":\s*"([^"]+)"', raw_text)
    cleaned = [clean_response_text(t) for t in themes]
    print(f"Found {len(cleaned)} themes")
    return cleaned

I'm using clean_response_text from Day 4 in the comprehension — normalising every extracted theme before returning the list.

Three weeks of functions. The Day 4 cleaner is still pulling its weight on Day 25.

I can feed the full open-text column from Qualtrics into this and get back a clean list of theme labels ready for frequency analysis.

re.findall and Capture Groups

python

import re
matches = re.findall(r'pattern', text)

findall returns a list of all non-overlapping matches. With a capture group (...), it returns the captured content.

Pattern anatomy

Token	Meaning
`"theme":`	Literal text
`\s*`	Zero or more whitespace
`([^"]+)`	One or more non-quote chars (captured)

When to use regex

Structured field extraction (key: value patterns), phone/email patterns, code parsing. For simple splits and replaces, string methods are faster and clearer.