You have a directory of log files. Every line looks something like this: [2026-03-31T14:22:05Z] ERROR auth 192.168.1.42 — Token expired for user maya.patel. No JSON, no CSV — just text. How would you pull out the timestamp, the log level, and the IP address from that line using what you knew before this track?
I would split on spaces and take index zero for the timestamp, index one for the level, index two for the service. Then look for the IP by splitting again and checking each chunk with isdigit() on the parts. I've actually done exactly this. It broke the moment a service name had a space in it, and also when an error message happened to start with a number.
You just described every log-parsing script written before 1987. Split on whitespace, hope for the best, maintain forever. The problem is not that you wrote bad code — split is the only tool you had for describing a pattern to Python. What you needed was a way to say "find me a sequence of four digit-groups separated by dots" rather than "split on dots and check each piece." That language exists. It is called regular expressions, and the re module is how Python speaks it.
I have heard of regex. I always assumed it was for people who write compilers.
Regex is for anyone who works with text that has structure. Log files are exactly that — structured enough to have recognizable patterns, unstructured enough that split breaks on them. By Day 12 you will be writing patterns that extract timestamps, IP addresses, and error codes from a raw log line in a single call. You will also cover string, textwrap, and difflib — quieter modules, but the ones that turn raw output into something a teammate will actually read. The capstone for this week is a function that pulls structured fields out of free-text log lines and redacts anything sensitive before it goes into a report.
The sensitive data thing is real. Our logs sometimes include session tokens in the error messages. I have been manually scanning for them before I share a log file with anyone.
re.sub() and named groups will fix that. Two lines. Next week you will have a function that scans any log line, redacts any field matching a pattern, and returns the sanitized version. You will never hand-scan a log file again.
Regular expressions are a formal language for describing patterns in text. The syntax looks cryptic — \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} — but it encodes a precise description: one to three digits, literal dot, one to three digits, literal dot, one to three digits, literal dot, one to three digits. That pattern matches any valid IP address. The metal detector finds the shape you describe.
The re module implements the POSIX extended regular expression standard with Python-specific additions. The most commonly used functions are re.search() (find the first match anywhere in a string), re.findall() (find all non-overlapping matches), and re.sub() (replace matches with a new string). Groups — parenthesized sub-patterns — let you extract specific pieces from a match rather than the whole thing. Named groups ((?P<name>...)) go further, making patterns self-documenting.
The string module provides constants (string.ascii_lowercase, string.digits) and string.Template for simple variable substitution in text. textwrap handles paragraph formatting — wrapping long lines, indenting blocks, shortening text with ellipsis. These are the typesetting tools that make terminal output readable instead of a wall of characters.
difflib compares sequences and text, producing unified diffs in the same format as git diff. It is the right tool when you need to show what changed between two versions of a log, a config file, or any text document. Week 2 covers the full text processing toolkit — not just the headline regex module, but the surrounding tools that turn raw text into polished, auditable output.
Sign up to save your notes.
You have a directory of log files. Every line looks something like this: [2026-03-31T14:22:05Z] ERROR auth 192.168.1.42 — Token expired for user maya.patel. No JSON, no CSV — just text. How would you pull out the timestamp, the log level, and the IP address from that line using what you knew before this track?
I would split on spaces and take index zero for the timestamp, index one for the level, index two for the service. Then look for the IP by splitting again and checking each chunk with isdigit() on the parts. I've actually done exactly this. It broke the moment a service name had a space in it, and also when an error message happened to start with a number.
You just described every log-parsing script written before 1987. Split on whitespace, hope for the best, maintain forever. The problem is not that you wrote bad code — split is the only tool you had for describing a pattern to Python. What you needed was a way to say "find me a sequence of four digit-groups separated by dots" rather than "split on dots and check each piece." That language exists. It is called regular expressions, and the re module is how Python speaks it.
I have heard of regex. I always assumed it was for people who write compilers.
Regex is for anyone who works with text that has structure. Log files are exactly that — structured enough to have recognizable patterns, unstructured enough that split breaks on them. By Day 12 you will be writing patterns that extract timestamps, IP addresses, and error codes from a raw log line in a single call. You will also cover string, textwrap, and difflib — quieter modules, but the ones that turn raw output into something a teammate will actually read. The capstone for this week is a function that pulls structured fields out of free-text log lines and redacts anything sensitive before it goes into a report.
The sensitive data thing is real. Our logs sometimes include session tokens in the error messages. I have been manually scanning for them before I share a log file with anyone.
re.sub() and named groups will fix that. Two lines. Next week you will have a function that scans any log line, redacts any field matching a pattern, and returns the sanitized version. You will never hand-scan a log file again.
Regular expressions are a formal language for describing patterns in text. The syntax looks cryptic — \d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3} — but it encodes a precise description: one to three digits, literal dot, one to three digits, literal dot, one to three digits, literal dot, one to three digits. That pattern matches any valid IP address. The metal detector finds the shape you describe.
The re module implements the POSIX extended regular expression standard with Python-specific additions. The most commonly used functions are re.search() (find the first match anywhere in a string), re.findall() (find all non-overlapping matches), and re.sub() (replace matches with a new string). Groups — parenthesized sub-patterns — let you extract specific pieces from a match rather than the whole thing. Named groups ((?P<name>...)) go further, making patterns self-documenting.
The string module provides constants (string.ascii_lowercase, string.digits) and string.Template for simple variable substitution in text. textwrap handles paragraph formatting — wrapping long lines, indenting blocks, shortening text with ellipsis. These are the typesetting tools that make terminal output readable instead of a wall of characters.
difflib compares sequences and text, producing unified diffs in the same format as git diff. It is the right tool when you need to show what changed between two versions of a log, a config file, or any text document. Week 2 covers the full text processing toolkit — not just the headline regex module, but the surrounding tools that turn raw text into polished, auditable output.