How much does zuzu.codes cost?

The starter track is free — read all lessons and practice for free. Full access to every track (current and future) is $14.99/month. Cancel anytime.

How long does each track take?

Each track is designed as a 30-day challenge — one lesson per day, about 15 minutes each. Go at your own pace, but the structure is built around daily consistency.

What's the lesson format?

Each lesson is a student-teacher dialogue with code examples, followed by a hands-on code challenge in an in-browser editor. You read, you understand, then you write real code.

Do I need prior coding experience?

Our beginner track starts from absolute zero — no prior experience needed. Advanced tracks build on earlier ones, and the platform tells you exactly where to start.

How is zuzu.codes different from freeCodeCamp or Codecademy?

zuzu.codes uses a structured 30-day track format with dialogue-based teaching, an in-browser code editor, and gamification (XP, streaks, progress tracking). The format builds genuine understanding through daily practice.

Python difflib: Compare Sequences and Find Similarities — Python Standard Library

Python difflib: Compare Sequences and Find Similarities — Python Standard Library | zuzu.codes

Day 14 · ~16m●

The ops lead sent a question this morning: "What changed between last Tuesday's error log and today's?" How would you answer that right now?

Open both files, read them into lists, loop through side by side... but the line counts might be different if errors were added or removed. I'd need to track insertions and deletions separately. That's not a trivial algorithm.

It's not, and Python ships one. difflib is the standard library's sequence comparison module. Think of it as a proofreader comparing two drafts — it highlights what changed, what moved, what's similar. Here's the simplest form:

python

import difflib

old_errors = ["ERROR auth: Token expired", "ERROR db: Timeout"]
new_errors = ["ERROR auth: Token expired", "WARNING api: Slow response", "ERROR db: Timeout"]

diff = list(difflib.unified_diff(old_errors, new_errors,
                                  lineterm="",
                                  fromfile="tuesday.log",
                                  tofile="today.log"))
for line in diff:
    print(line)

unified_diff is the same format as git diff? With the ---, +++, @@ headers and +/- prefixes? I've been staring at git diffs for two years. I didn't know Python could generate them.

The unified diff format was designed for humans to read quickly — + lines are additions, - lines are deletions, context lines have no prefix. difflib.unified_diff() generates it for any two sequences of strings. The fromfile and tofile arguments label the headers. lineterm="" strips newline characters from the output lines so they don't double up when you print them.

What if I don't want the diff format — I just want to know which lines are new, which were removed, and which stayed the same? The ops lead wants counts, not a visual diff.

difflib.SequenceMatcher is the underlying engine. It gives you the raw comparison operations:

python

import difflib

old = ["ERROR auth: Token expired", "ERROR db: Timeout", "INFO api: Started"]
new = ["ERROR auth: Token expired", "WARNING api: Slow", "ERROR db: Timeout"]

sm = difflib.SequenceMatcher(None, old, new)
for tag, i1, i2, j1, j2 in sm.get_opcodes():
    print(tag, old[i1:i2], "->", new[j1:j2])

The get_opcodes() method returns a list of tuples where tag is one of 'equal', 'replace', 'delete', or 'insert'. You walk through the operations to count additions, deletions, and unchanged lines.

So for the ops lead's question: I run SequenceMatcher on the two lists, count 'insert' operations for new error types, 'delete' for resolved ones, 'equal' for persistent issues. That's the summary he actually wants — not a visual diff.

Exactly. And difflib works on any sequences — not just lists of strings. You can compare character-by-character within a line:

python

import difflib

old_line = "ERROR [auth] 192.168.1.42: Token expired for maya.patel"
new_line = "ERROR [auth] 10.0.0.1: Token expired for ali.hassan"

sm = difflib.SequenceMatcher(None, old_line, new_line)
print(f"Similarity: {sm.ratio():.0%}")  # how similar the two strings are

sm.ratio() gives a similarity score between 0 and 1? So two log lines from the same error type but different users would have high similarity — same pattern, different details.

Correct. 1.0 is identical. 0.0 is completely different. For grouping similar error messages — "these twenty errors are all the same pattern, just different users" — difflib.get_close_matches() does fuzzy matching:

python

import difflib

error_types = ["Token expired", "Token invalid", "Token missing", "Connection refused"]
query = "Toekn expired"  # typo
matches = difflib.get_close_matches(query, error_types, n=1, cutoff=0.6)
print(matches)  # ['Token expired']

get_close_matches even handles typos. That's useful for normalizing the ops team's hand-written error categories — they're inconsistent about capitalization and spelling and I have to group them manually. This could automate that.

It handles the fuzzy matching you'd otherwise do with regex or Levenshtein distance. Week 2 is done after today — you know re, string, and difflib. All three deal with text, but at different levels: regex finds patterns, string handles formatting, difflib compares sequences. Together they cover everything the ops team's raw text data throws at you.

difflib: Sequence Comparison Without Reinventing the Wheel

The diff problem — given two sequences, describe how one becomes the other using the minimum number of insertions and deletions — is a classical computer science problem. Python's difflib module implements the Ratcliff/Obershelp algorithm, which emphasizes common subsequences rather than strict edit distance. The result is human-readable diffs that prioritize blocks of unchanged content.

The Unified Diff Format

difflib.unified_diff(a, b, fromfile, tofile, n=3) generates the standard unified diff format used by git diff, diff -u, and virtually every version control system. The n parameter controls context lines — the unchanged lines shown around each change block. n=0 shows only changed lines, no context. n=3 is the convention. The output is a generator of strings; convert to a list or join with newlines for display.

SequenceMatcher: The Engine

difflib.SequenceMatcher(isjunk, a, b) is the underlying comparison engine. The isjunk parameter is a function that returns True for elements that should be ignored when finding matches (whitespace, blank lines). None means nothing is treated as junk. Key methods: .ratio() returns a float 0.0-1.0 representing similarity. .get_matching_blocks() returns a list of matching subsequences. .get_opcodes() returns the minimal edit sequence as a list of (tag, i1, i2, j1, j2) tuples.

Opcode Tags

The four opcode tags represent the complete vocabulary of sequence transformation: 'equal' means a[i1:i2] == b[j1:j2]. 'replace' means a[i1:i2] was replaced by b[j1:j2]. 'delete' means a[i1:i2] was removed. 'insert' means b[j1:j2] was added. Walking the opcode list lets you count changes, filter by type, or reconstruct either sequence from the other.

get_close_matches: Fuzzy String Matching

difflib.get_close_matches(word, possibilities, n=3, cutoff=0.6) returns the n closest matches from possibilities with similarity above cutoff. This is fuzzy matching without installing a third-party library. It handles typos, abbreviated names, and inconsistent capitalization. The cutoff of 0.6 is the standard threshold; lower values allow more dissimilar matches.

Performance Considerations

SequenceMatcher is O(n²) in the worst case. For large sequences — comparing two 10,000-line log files — it may be slow. The practical limit is a few hundred lines for interactive use. For larger files, use difflib.context_diff() with n=0 to limit output, or compare summaries (error type counts) rather than raw lines.

Practice your skills

Already have an account? Sign in

Day 14 · ~16m●

The ops lead sent a question this morning: "What changed between last Tuesday's error log and today's?" How would you answer that right now?

python

import difflib

old_errors = ["ERROR auth: Token expired", "ERROR db: Timeout"]
new_errors = ["ERROR auth: Token expired", "WARNING api: Slow response", "ERROR db: Timeout"]

diff = list(difflib.unified_diff(old_errors, new_errors,
                                  lineterm="",
                                  fromfile="tuesday.log",
                                  tofile="today.log"))
for line in diff:
    print(line)

unified_diff is the same format as git diff? With the ---, +++, @@ headers and +/- prefixes? I've been staring at git diffs for two years. I didn't know Python could generate them.

What if I don't want the diff format — I just want to know which lines are new, which were removed, and which stayed the same? The ops lead wants counts, not a visual diff.

difflib.SequenceMatcher is the underlying engine. It gives you the raw comparison operations:

python

import difflib

old = ["ERROR auth: Token expired", "ERROR db: Timeout", "INFO api: Started"]
new = ["ERROR auth: Token expired", "WARNING api: Slow", "ERROR db: Timeout"]

sm = difflib.SequenceMatcher(None, old, new)
for tag, i1, i2, j1, j2 in sm.get_opcodes():
    print(tag, old[i1:i2], "->", new[j1:j2])

Exactly. And difflib works on any sequences — not just lists of strings. You can compare character-by-character within a line:

python

import difflib

old_line = "ERROR [auth] 192.168.1.42: Token expired for maya.patel"
new_line = "ERROR [auth] 10.0.0.1: Token expired for ali.hassan"

sm = difflib.SequenceMatcher(None, old_line, new_line)
print(f"Similarity: {sm.ratio():.0%}")  # how similar the two strings are

sm.ratio() gives a similarity score between 0 and 1? So two log lines from the same error type but different users would have high similarity — same pattern, different details.

python

import difflib

error_types = ["Token expired", "Token invalid", "Token missing", "Connection refused"]
query = "Toekn expired"  # typo
matches = difflib.get_close_matches(query, error_types, n=1, cutoff=0.6)
print(matches)  # ['Token expired']

difflib: Sequence Comparison Without Reinventing the Wheel

The Unified Diff Format

SequenceMatcher: The Engine

Opcode Tags

get_close_matches: Fuzzy String Matching

Performance Considerations

Day 14 · ~16m●

The ops lead sent a question this morning: "What changed between last Tuesday's error log and today's?" How would you answer that right now?

python

import difflib

old_errors = ["ERROR auth: Token expired", "ERROR db: Timeout"]
new_errors = ["ERROR auth: Token expired", "WARNING api: Slow response", "ERROR db: Timeout"]

diff = list(difflib.unified_diff(old_errors, new_errors,
                                  lineterm="",
                                  fromfile="tuesday.log",
                                  tofile="today.log"))
for line in diff:
    print(line)

unified_diff is the same format as git diff? With the ---, +++, @@ headers and +/- prefixes? I've been staring at git diffs for two years. I didn't know Python could generate them.

What if I don't want the diff format — I just want to know which lines are new, which were removed, and which stayed the same? The ops lead wants counts, not a visual diff.

difflib.SequenceMatcher is the underlying engine. It gives you the raw comparison operations:

python

import difflib

old = ["ERROR auth: Token expired", "ERROR db: Timeout", "INFO api: Started"]
new = ["ERROR auth: Token expired", "WARNING api: Slow", "ERROR db: Timeout"]

sm = difflib.SequenceMatcher(None, old, new)
for tag, i1, i2, j1, j2 in sm.get_opcodes():
    print(tag, old[i1:i2], "->", new[j1:j2])

Exactly. And difflib works on any sequences — not just lists of strings. You can compare character-by-character within a line:

python

import difflib

old_line = "ERROR [auth] 192.168.1.42: Token expired for maya.patel"
new_line = "ERROR [auth] 10.0.0.1: Token expired for ali.hassan"

sm = difflib.SequenceMatcher(None, old_line, new_line)
print(f"Similarity: {sm.ratio():.0%}")  # how similar the two strings are

sm.ratio() gives a similarity score between 0 and 1? So two log lines from the same error type but different users would have high similarity — same pattern, different details.

python

import difflib

error_types = ["Token expired", "Token invalid", "Token missing", "Connection refused"]
query = "Toekn expired"  # typo
matches = difflib.get_close_matches(query, error_types, n=1, cutoff=0.6)
print(matches)  # ['Token expired']

difflib: Sequence Comparison Without Reinventing the Wheel

The Unified Diff Format

SequenceMatcher: The Engine

Opcode Tags

get_close_matches: Fuzzy String Matching

Performance Considerations

Practice your skills

Already have an account? Sign in

Day 14 · ~16m●

The ops lead sent a question this morning: "What changed between last Tuesday's error log and today's?" How would you answer that right now?

python

import difflib

old_errors = ["ERROR auth: Token expired", "ERROR db: Timeout"]
new_errors = ["ERROR auth: Token expired", "WARNING api: Slow response", "ERROR db: Timeout"]

diff = list(difflib.unified_diff(old_errors, new_errors,
                                  lineterm="",
                                  fromfile="tuesday.log",
                                  tofile="today.log"))
for line in diff:
    print(line)

unified_diff is the same format as git diff? With the ---, +++, @@ headers and +/- prefixes? I've been staring at git diffs for two years. I didn't know Python could generate them.

What if I don't want the diff format — I just want to know which lines are new, which were removed, and which stayed the same? The ops lead wants counts, not a visual diff.

difflib.SequenceMatcher is the underlying engine. It gives you the raw comparison operations:

python

import difflib

old = ["ERROR auth: Token expired", "ERROR db: Timeout", "INFO api: Started"]
new = ["ERROR auth: Token expired", "WARNING api: Slow", "ERROR db: Timeout"]

sm = difflib.SequenceMatcher(None, old, new)
for tag, i1, i2, j1, j2 in sm.get_opcodes():
    print(tag, old[i1:i2], "->", new[j1:j2])

Exactly. And difflib works on any sequences — not just lists of strings. You can compare character-by-character within a line:

python

import difflib

old_line = "ERROR [auth] 192.168.1.42: Token expired for maya.patel"
new_line = "ERROR [auth] 10.0.0.1: Token expired for ali.hassan"

sm = difflib.SequenceMatcher(None, old_line, new_line)
print(f"Similarity: {sm.ratio():.0%}")  # how similar the two strings are

sm.ratio() gives a similarity score between 0 and 1? So two log lines from the same error type but different users would have high similarity — same pattern, different details.

python

import difflib

error_types = ["Token expired", "Token invalid", "Token missing", "Connection refused"]
query = "Toekn expired"  # typo
matches = difflib.get_close_matches(query, error_types, n=1, cutoff=0.6)
print(matches)  # ['Token expired']