How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Regex Patterns for Real Data: IPs, Dates, Emails, Log Lines — Python Standard Library

Regex Patterns for Real Data: IPs, Dates, Emails, Log Lines — Python Standard Library | zuzu.codes

Day 12 · ~16m●

You've learned re.search(), re.findall(), named groups, and re.sub(). Today we apply them to the exact shapes you'll encounter in real log data: timestamps, IPs, email addresses, and log line formats.

I've already got the IP pattern memorized — r'\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b'. And I wrote the timestamp pattern yesterday: r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}'. But the ops team's older logs have timestamps in ISO 8601 format with a T and Z: 2026-04-07T09:14:33Z. Different format.

The pattern adjustment is small — T instead of space, optional Z at the end:

python

import re

# ISO 8601 timestamp: 2026-04-07T09:14:33Z
ts_iso = re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?')

# Legacy timestamp: 2026-04-07 09:14:33
ts_legacy = re.compile(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}')

# Match either format
ts_either = re.compile(r'\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}Z?')

The character class [T ] matches either a T or a space. One pattern, two formats.

[T ] is a two-character class that matches either character. So I can write one compiled pattern that handles both the old and new timestamp format. The ops team has mixed logs from the migration period — this is exactly the problem I have right now.

Email addresses come up less often in server logs but frequently in application logs — user actions, auth events, notification sends. A practical email pattern isn't trying to cover every RFC 5322 edge case:

python

import re

# Practical: catches 99%+ of real email addresses in logs
email_pattern = re.compile(r'[\w.+-]+@[\w-]+\.[\w.]+')

log_line = "Auth failed for user maya.patel+ops@company.internal from 10.0.0.1"
emails = email_pattern.findall(log_line)
print(emails)  # ['maya.patel+ops@company.internal']

[\w.+-]+ — word characters, dots, plus signs, and hyphens before the @. [\w-]+ for the domain. \.[\w.]+ for the TLD which can itself have dots for things like .co.uk or .internal. That covers everything in the ops team's logs.

And when a single log line contains multiple extractable fields, re.findall() with groups gives you structured output directly:

python

import re

ERROR_EVENTS = re.compile(
    r'(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?)'
    r'.*?(?P<level>ERROR|CRITICAL)'
    r'.*?(?P<service>[\w-]+):'
)

logs = [
    "2026-04-07T09:14:33Z ERROR auth: Token expired",
    "2026-04-07T09:15:01Z INFO api: Request received",
    "2026-04-07T09:16:22Z CRITICAL db: Connection pool exhausted",
]

for line in logs:
    m = ERROR_EVENTS.search(line)
    if m and m.group("level") in ("ERROR", "CRITICAL"):
        print(m.groupdict())

The .*? between fields is non-greedy — it skips the minimum number of characters needed to reach the next part of the pattern. So the pattern doesn't accidentally consume the service name while looking for the level.

Exactly. You're reading regex patterns and explaining non-greedy behavior. Three days ago you called it keyboard noise.

I still think the syntax is a crime. But I understand the crime now, and I can commit it when I need to. That's the important part.

One practical pattern you'll need for the analyzer — extracting HTTP status codes from access logs:

python

import re

# Apache/nginx access log: "GET /api/users HTTP/1.1" 200 342
access_pattern = re.compile(
    r'"(?P<method>GET|POST|PUT|DELETE|PATCH) (?P<path>/[^"]*) HTTP/\d+\.\d+"'
    r' (?P<status>\d{3})'
    r' (?P<size>\d+|-)'
)

[^"]* — any character that's not a double-quote, zero or more times. So it captures everything between the opening quote and the HTTP version, which is the full path including query strings. Smart. The | in the size field matches either digits or - for missing content-length.

Correct. The character class negation [^...] is one of the most useful tools for parsing delimited text — instead of saying what you want, you say what you don't want. It's cleaner than .+? followed by a lookahead in most cases.

I've got the eight metacharacters, named groups, substitution, non-greedy, and character class negation. I think I actually understand regex now. I don't love it. But I understand it.

Understanding is enough. Love is optional. Tomorrow we take a break from regex and look at the string module — template strings, text constants, and formatting utilities. Less powerful than regex, much more readable when you don't need pattern matching.

Regex Patterns for Real-World Log Data

Real log files contain a consistent set of structured patterns: timestamps in one or two formats, IPv4 addresses, HTTP methods and status codes, user identifiers, and error codes. Learning to recognize and express these shapes in regex syntax — rather than trying to invent regex from first principles — covers 90% of log analysis work.

Timestamps: The Most Common Pattern

ISO 8601 timestamps (2026-04-07T09:14:33Z) and syslog-style timestamps (2026-04-07 09:14:33) differ only in the separator character. The character class [T ] handles both in a single pattern. For timestamps with fractional seconds (09:14:33.456), add (\.\d+)? — a non-capturing group with ? making it optional. For timezone offsets (+05:30), add ([+-]\d{2}:\d{2}|Z)?.

IPv4 Addresses: The Canonical Pattern

\b\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\b matches any four-octet dotted-decimal address. The word boundaries prevent matching octets that are part of longer numbers. This pattern does not validate that each octet is 0-255 — 999.999.999.999 matches. For log analysis, where invalid IPs are the minority and you'd rather flag them than silently drop them, this is the right tradeoff. Stricter validation belongs in a separate step after extraction.

Character Classes and Negation

[abc] matches any character in the set. [a-z] matches any lowercase letter. [^abc] matches any character NOT in the set. Character class negation is cleaner than non-greedy dot matching for parsing delimited content: [^"]* (everything except a double-quote) is more precise than .*?" (anything up to the next double-quote). In access log parsing, [^"]* for the request field is both faster and more correct.

Email Addresses: Practical vs Complete

The RFC 5322 specification for valid email addresses is notoriously complex — the full grammar covers quoted strings, comments, and internationalized addresses. In practice, [\w.+-]+@[\w-]+\.[\w.]+ matches the overwhelming majority of email addresses that actually appear in server logs. The philosophy: write the simplest pattern that matches your actual data, not the most theoretically complete pattern for all possible inputs.

Non-Greedy Matching in Multi-Field Patterns

When a pattern spans multiple fields separated by variable-length content, .*? (non-greedy dot-star) matches as little as possible before the next anchor. .* (greedy) matches as much as possible, potentially skipping over intermediate fields. For log lines where field order is fixed but field content length varies, .*? between anchors extracts exactly the fields you want without consuming the separators.

Practice your skills

Already have an account? Sign in

Day 12 · ~16m●

The pattern adjustment is small — T instead of space, optional Z at the end:

python

import re

# ISO 8601 timestamp: 2026-04-07T09:14:33Z
ts_iso = re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?')

# Legacy timestamp: 2026-04-07 09:14:33
ts_legacy = re.compile(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}')

# Match either format
ts_either = re.compile(r'\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}Z?')

The character class [T ] matches either a T or a space. One pattern, two formats.

python

import re

# Practical: catches 99%+ of real email addresses in logs
email_pattern = re.compile(r'[\w.+-]+@[\w-]+\.[\w.]+')

log_line = "Auth failed for user maya.patel+ops@company.internal from 10.0.0.1"
emails = email_pattern.findall(log_line)
print(emails)  # ['maya.patel+ops@company.internal']

And when a single log line contains multiple extractable fields, re.findall() with groups gives you structured output directly:

python

import re

ERROR_EVENTS = re.compile(
    r'(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?)'
    r'.*?(?P<level>ERROR|CRITICAL)'
    r'.*?(?P<service>[\w-]+):'
)

logs = [
    "2026-04-07T09:14:33Z ERROR auth: Token expired",
    "2026-04-07T09:15:01Z INFO api: Request received",
    "2026-04-07T09:16:22Z CRITICAL db: Connection pool exhausted",
]

for line in logs:
    m = ERROR_EVENTS.search(line)
    if m and m.group("level") in ("ERROR", "CRITICAL"):
        print(m.groupdict())

Exactly. You're reading regex patterns and explaining non-greedy behavior. Three days ago you called it keyboard noise.

I still think the syntax is a crime. But I understand the crime now, and I can commit it when I need to. That's the important part.

One practical pattern you'll need for the analyzer — extracting HTTP status codes from access logs:

python

import re

# Apache/nginx access log: "GET /api/users HTTP/1.1" 200 342
access_pattern = re.compile(
    r'"(?P<method>GET|POST|PUT|DELETE|PATCH) (?P<path>/[^"]*) HTTP/\d+\.\d+"'
    r' (?P<status>\d{3})'
    r' (?P<size>\d+|-)'
)

I've got the eight metacharacters, named groups, substitution, non-greedy, and character class negation. I think I actually understand regex now. I don't love it. But I understand it.

Regex Patterns for Real-World Log Data

Timestamps: The Most Common Pattern

IPv4 Addresses: The Canonical Pattern

Character Classes and Negation

Email Addresses: Practical vs Complete

Non-Greedy Matching in Multi-Field Patterns

Day 12 · ~16m●

The pattern adjustment is small — T instead of space, optional Z at the end:

python

import re

# ISO 8601 timestamp: 2026-04-07T09:14:33Z
ts_iso = re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?')

# Legacy timestamp: 2026-04-07 09:14:33
ts_legacy = re.compile(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}')

# Match either format
ts_either = re.compile(r'\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}Z?')

The character class [T ] matches either a T or a space. One pattern, two formats.

python

import re

# Practical: catches 99%+ of real email addresses in logs
email_pattern = re.compile(r'[\w.+-]+@[\w-]+\.[\w.]+')

log_line = "Auth failed for user maya.patel+ops@company.internal from 10.0.0.1"
emails = email_pattern.findall(log_line)
print(emails)  # ['maya.patel+ops@company.internal']

And when a single log line contains multiple extractable fields, re.findall() with groups gives you structured output directly:

python

import re

ERROR_EVENTS = re.compile(
    r'(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?)'
    r'.*?(?P<level>ERROR|CRITICAL)'
    r'.*?(?P<service>[\w-]+):'
)

logs = [
    "2026-04-07T09:14:33Z ERROR auth: Token expired",
    "2026-04-07T09:15:01Z INFO api: Request received",
    "2026-04-07T09:16:22Z CRITICAL db: Connection pool exhausted",
]

for line in logs:
    m = ERROR_EVENTS.search(line)
    if m and m.group("level") in ("ERROR", "CRITICAL"):
        print(m.groupdict())

Exactly. You're reading regex patterns and explaining non-greedy behavior. Three days ago you called it keyboard noise.

I still think the syntax is a crime. But I understand the crime now, and I can commit it when I need to. That's the important part.

One practical pattern you'll need for the analyzer — extracting HTTP status codes from access logs:

python

import re

# Apache/nginx access log: "GET /api/users HTTP/1.1" 200 342
access_pattern = re.compile(
    r'"(?P<method>GET|POST|PUT|DELETE|PATCH) (?P<path>/[^"]*) HTTP/\d+\.\d+"'
    r' (?P<status>\d{3})'
    r' (?P<size>\d+|-)'
)

I've got the eight metacharacters, named groups, substitution, non-greedy, and character class negation. I think I actually understand regex now. I don't love it. But I understand it.

Regex Patterns for Real-World Log Data

Timestamps: The Most Common Pattern

IPv4 Addresses: The Canonical Pattern

Character Classes and Negation

Email Addresses: Practical vs Complete

Non-Greedy Matching in Multi-Field Patterns

Practice your skills

Already have an account? Sign in

Day 12 · ~16m●

The pattern adjustment is small — T instead of space, optional Z at the end:

python

import re

# ISO 8601 timestamp: 2026-04-07T09:14:33Z
ts_iso = re.compile(r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?')

# Legacy timestamp: 2026-04-07 09:14:33
ts_legacy = re.compile(r'\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}')

# Match either format
ts_either = re.compile(r'\d{4}-\d{2}-\d{2}[T ]\d{2}:\d{2}:\d{2}Z?')

The character class [T ] matches either a T or a space. One pattern, two formats.

python

import re

# Practical: catches 99%+ of real email addresses in logs
email_pattern = re.compile(r'[\w.+-]+@[\w-]+\.[\w.]+')

log_line = "Auth failed for user maya.patel+ops@company.internal from 10.0.0.1"
emails = email_pattern.findall(log_line)
print(emails)  # ['maya.patel+ops@company.internal']

And when a single log line contains multiple extractable fields, re.findall() with groups gives you structured output directly:

python

import re

ERROR_EVENTS = re.compile(
    r'(?P<timestamp>\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}Z?)'
    r'.*?(?P<level>ERROR|CRITICAL)'
    r'.*?(?P<service>[\w-]+):'
)

logs = [
    "2026-04-07T09:14:33Z ERROR auth: Token expired",
    "2026-04-07T09:15:01Z INFO api: Request received",
    "2026-04-07T09:16:22Z CRITICAL db: Connection pool exhausted",
]

for line in logs:
    m = ERROR_EVENTS.search(line)
    if m and m.group("level") in ("ERROR", "CRITICAL"):
        print(m.groupdict())

Exactly. You're reading regex patterns and explaining non-greedy behavior. Three days ago you called it keyboard noise.

I still think the syntax is a crime. But I understand the crime now, and I can commit it when I need to. That's the important part.

One practical pattern you'll need for the analyzer — extracting HTTP status codes from access logs:

python

import re

# Apache/nginx access log: "GET /api/users HTTP/1.1" 200 342
access_pattern = re.compile(
    r'"(?P<method>GET|POST|PUT|DELETE|PATCH) (?P<path>/[^"]*) HTTP/\d+\.\d+"'
    r' (?P<status>\d{3})'
    r' (?P<size>\d+|-)'
)

I've got the eight metacharacters, named groups, substitution, non-greedy, and character class negation. I think I actually understand regex now. I don't love it. But I understand it.