How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Regex Groups, Alternation, and Substitution — Python Standard Library

Regex Groups, Alternation, and Substitution — Python Standard Library | zuzu.codes

Day 11 · ~15m●

Yesterday you wrote five separate re.search() calls to extract five fields from a log line. Each one is independent and correct. Today we make it a single sweep with named groups.

Named groups — so instead of .group(1) for the timestamp and .group(2) for the level, I can do .group("timestamp") and .group("level")? The parentheses get a label?

Exactly. The syntax is (?P<name>pattern):

python

import re

LOG_PATTERN = re.compile(
    r'(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})'
    r' (?P<level>DEBUG|INFO|WARN|WARNING|ERROR|CRITICAL)'
    r' \[(?P<service>[\w-]+)\]'
    r' (?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
    r' - (?P<message>.+)$'
)

log_line = "2026-04-07 09:14:33 ERROR [auth] 192.168.1.42 - Token expired"
match = LOG_PATTERN.search(log_line)
if match:
    print(match.group("timestamp"))  # 2026-04-07 09:14:33
    print(match.group("level"))      # ERROR
    print(match.groupdict())         # all five fields as a dict

.groupdict() gives me a Python dict directly? With the group names as keys? That's the return value I was building manually — now the match object builds it for me.

One sweep through the string, five labeled captures, one dict out. The pattern is split across two string literals for readability — Python concatenates adjacent string literals automatically. And re.compile() pre-parses the pattern so it doesn't re-parse on every log line.

For substitution — re.sub(). When I want to redact IP addresses from logs before sharing them with an external team, I replace every IP with a placeholder?

Exactly. re.sub() replaces every match:

python

import re

def redact_ips(log_text: str) -> str:
    ip_pattern = r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'
    return re.sub(ip_pattern, "[REDACTED]", log_text)

log = "Connection from 192.168.1.42 failed, retry from 10.0.0.1"
print(redact_ips(log))
# Connection from [REDACTED] failed, retry from [REDACTED]

I've been doing this with a loop and str.replace(), but that required knowing all the IP addresses in advance. re.sub() finds and replaces any IP matching the pattern — I don't know the IPs ahead of time and I don't need to.

And the replacement can be a function, not just a string. You can transform each match individually — anonymize to a counter, truncate to the first two octets, anything. The replacement function receives the match object:

python

import re

counter = {}
def anonymize_ip(match):
    ip = match.group(0)
    if ip not in counter:
        counter[ip] = f"IP_{len(counter) + 1}"
    return counter[ip]

log = "192.168.1.42 -> 10.0.0.1, then 192.168.1.42 again"
print(re.sub(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', anonymize_ip, log))
# IP_1 -> IP_2, then IP_1 again

The same IP gets the same label throughout the log. Consistent anonymization without knowing the IPs up front. That's actually useful for sharing logs with vendors — they see patterns in the IP behavior without seeing the real addresses.

You just described a real security workflow. The regex finds the shape. The replacement function handles the logic. Today's problem puts both together: extract fields with named groups, apply substitutions to redact sensitive data, return the structured result.

One question — what if the log line doesn't fully match the combined pattern? Yesterday with five separate searches, each field degraded independently. With one big pattern, if one piece is wrong the whole match fails.

That's the tradeoff. Single-pattern is faster and gives you .groupdict() directly. Multi-search is more resilient to partial matches. For a controlled log format where you know the structure, single-pattern wins. For genuinely variable-format logs, per-field searches are safer. Choose based on how much you trust the format.

Regex Groups, Alternation, and Substitution

Named groups are the feature that transforms regex from a search tool into a data extraction tool. The distinction matters: finding a timestamp in a log line is a search problem. Extracting the timestamp, level, service, IP, and message simultaneously and returning them as a labelled dict is a data extraction problem. Named groups solve the latter cleanly.

Named Groups: (?P<name>pattern)

The (?P<name>...) syntax labels a capturing group with a name. On a successful match, match.group("name") retrieves the captured text, and match.groupdict() returns all named groups as a single dict — exactly the shape you'd want to pass to the rest of your processing pipeline. Named groups are also self-documenting: (?P<timestamp>...) is readable in a way that (...) with a comment is not.

Non-Capturing Groups

Sometimes you need grouping for alternation or repetition without capturing: (?:ERROR|WARNING) groups the alternation without creating a group number. This avoids polluting .group() indices with structural groups you don't need. Named groups capture; non-capturing groups (?:...) group without capturing.

re.sub() with a Callable Replacement

re.sub(pattern, repl, string) replaces every match of pattern in string with repl. When repl is a string, it supports backreferences: \1 inserts the first group. When repl is a callable, it receives the match object for each replacement and returns the replacement string. This makes re.sub() capable of context-aware replacement — anonymizing IPs consistently, normalizing timestamp formats, redacting patterns based on surrounding context.

Greedy vs Non-Greedy

Quantifiers are greedy by default: .* matches as many characters as possible. Add ? to make them non-greedy: .*? matches as few characters as possible. For log messages that end with a pattern — r'ERROR.*error_code: (\d+)' — greedy matching can overshoot into the next log line if you're processing multi-line text. Non-greedy is the safer default for log parsing.

Compiled Patterns in Production

re.compile(pattern) returns a compiled pattern object with .search(), .findall(), .sub(), and .match() methods. Compilation parses the pattern string once. For a loop over 100,000 log lines, this eliminates 99,999 redundant parse operations. The performance difference is measurable — typically 15-30% faster for simple patterns, more for complex ones. Always compile patterns that are used more than once.

Practice your skills

Already have an account? Sign in

Day 11 · ~15m●

Yesterday you wrote five separate re.search() calls to extract five fields from a log line. Each one is independent and correct. Today we make it a single sweep with named groups.

Named groups — so instead of .group(1) for the timestamp and .group(2) for the level, I can do .group("timestamp") and .group("level")? The parentheses get a label?

Exactly. The syntax is (?P<name>pattern):

python

import re

LOG_PATTERN = re.compile(
    r'(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})'
    r' (?P<level>DEBUG|INFO|WARN|WARNING|ERROR|CRITICAL)'
    r' \[(?P<service>[\w-]+)\]'
    r' (?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
    r' - (?P<message>.+)$'
)

log_line = "2026-04-07 09:14:33 ERROR [auth] 192.168.1.42 - Token expired"
match = LOG_PATTERN.search(log_line)
if match:
    print(match.group("timestamp"))  # 2026-04-07 09:14:33
    print(match.group("level"))      # ERROR
    print(match.groupdict())         # all five fields as a dict

.groupdict() gives me a Python dict directly? With the group names as keys? That's the return value I was building manually — now the match object builds it for me.

For substitution — re.sub(). When I want to redact IP addresses from logs before sharing them with an external team, I replace every IP with a placeholder?

Exactly. re.sub() replaces every match:

python

import re

def redact_ips(log_text: str) -> str:
    ip_pattern = r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'
    return re.sub(ip_pattern, "[REDACTED]", log_text)

log = "Connection from 192.168.1.42 failed, retry from 10.0.0.1"
print(redact_ips(log))
# Connection from [REDACTED] failed, retry from [REDACTED]

python

import re

counter = {}
def anonymize_ip(match):
    ip = match.group(0)
    if ip not in counter:
        counter[ip] = f"IP_{len(counter) + 1}"
    return counter[ip]

log = "192.168.1.42 -> 10.0.0.1, then 192.168.1.42 again"
print(re.sub(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', anonymize_ip, log))
# IP_1 -> IP_2, then IP_1 again

Regex Groups, Alternation, and Substitution

Named Groups: (?P<name>pattern)

Non-Capturing Groups

re.sub() with a Callable Replacement

Greedy vs Non-Greedy

Compiled Patterns in Production

Day 11 · ~15m●

Yesterday you wrote five separate re.search() calls to extract five fields from a log line. Each one is independent and correct. Today we make it a single sweep with named groups.

Named groups — so instead of .group(1) for the timestamp and .group(2) for the level, I can do .group("timestamp") and .group("level")? The parentheses get a label?

Exactly. The syntax is (?P<name>pattern):

python

import re

LOG_PATTERN = re.compile(
    r'(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})'
    r' (?P<level>DEBUG|INFO|WARN|WARNING|ERROR|CRITICAL)'
    r' \[(?P<service>[\w-]+)\]'
    r' (?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
    r' - (?P<message>.+)$'
)

log_line = "2026-04-07 09:14:33 ERROR [auth] 192.168.1.42 - Token expired"
match = LOG_PATTERN.search(log_line)
if match:
    print(match.group("timestamp"))  # 2026-04-07 09:14:33
    print(match.group("level"))      # ERROR
    print(match.groupdict())         # all five fields as a dict

.groupdict() gives me a Python dict directly? With the group names as keys? That's the return value I was building manually — now the match object builds it for me.

For substitution — re.sub(). When I want to redact IP addresses from logs before sharing them with an external team, I replace every IP with a placeholder?

Exactly. re.sub() replaces every match:

python

import re

def redact_ips(log_text: str) -> str:
    ip_pattern = r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'
    return re.sub(ip_pattern, "[REDACTED]", log_text)

log = "Connection from 192.168.1.42 failed, retry from 10.0.0.1"
print(redact_ips(log))
# Connection from [REDACTED] failed, retry from [REDACTED]

python

import re

counter = {}
def anonymize_ip(match):
    ip = match.group(0)
    if ip not in counter:
        counter[ip] = f"IP_{len(counter) + 1}"
    return counter[ip]

log = "192.168.1.42 -> 10.0.0.1, then 192.168.1.42 again"
print(re.sub(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', anonymize_ip, log))
# IP_1 -> IP_2, then IP_1 again

Regex Groups, Alternation, and Substitution

Named Groups: (?P<name>pattern)

Non-Capturing Groups

re.sub() with a Callable Replacement

Greedy vs Non-Greedy

Compiled Patterns in Production

Practice your skills

Already have an account? Sign in

Day 11 · ~15m●

Yesterday you wrote five separate re.search() calls to extract five fields from a log line. Each one is independent and correct. Today we make it a single sweep with named groups.

Named groups — so instead of .group(1) for the timestamp and .group(2) for the level, I can do .group("timestamp") and .group("level")? The parentheses get a label?

Exactly. The syntax is (?P<name>pattern):

python

import re

LOG_PATTERN = re.compile(
    r'(?P<timestamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})'
    r' (?P<level>DEBUG|INFO|WARN|WARNING|ERROR|CRITICAL)'
    r' \[(?P<service>[\w-]+)\]'
    r' (?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})'
    r' - (?P<message>.+)$'
)

log_line = "2026-04-07 09:14:33 ERROR [auth] 192.168.1.42 - Token expired"
match = LOG_PATTERN.search(log_line)
if match:
    print(match.group("timestamp"))  # 2026-04-07 09:14:33
    print(match.group("level"))      # ERROR
    print(match.groupdict())         # all five fields as a dict

.groupdict() gives me a Python dict directly? With the group names as keys? That's the return value I was building manually — now the match object builds it for me.

For substitution — re.sub(). When I want to redact IP addresses from logs before sharing them with an external team, I replace every IP with a placeholder?

Exactly. re.sub() replaces every match:

python

import re

def redact_ips(log_text: str) -> str:
    ip_pattern = r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'
    return re.sub(ip_pattern, "[REDACTED]", log_text)

log = "Connection from 192.168.1.42 failed, retry from 10.0.0.1"
print(redact_ips(log))
# Connection from [REDACTED] failed, retry from [REDACTED]

python

import re

counter = {}
def anonymize_ip(match):
    ip = match.group(0)
    if ip not in counter:
        counter[ip] = f"IP_{len(counter) + 1}"
    return counter[ip]

log = "192.168.1.42 -> 10.0.0.1, then 192.168.1.42 again"
print(re.sub(r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b', anonymize_ip, log))
# IP_1 -> IP_2, then IP_1 again