The ops team's log directory now has files from six months — hundreds of files across nested subdirectories. How are you currently finding the right files to analyze?
I've been using os.listdir() and then filtering with if filename.endswith(".log"). For nested directories I wrote a recursive function with os.walk(). It works. It's also thirty lines.
This is where glob earns its spot in your toolkit. One function call finds files matching a pattern across an entire directory tree:
import glob
# All .log files in /var/logs/services/ (one level only)
log_files = glob.glob("/var/logs/services/*.log")
# All .log files anywhere under /var/logs/ (recursive)
all_logs = glob.glob("/var/logs/**/*.log", recursive=True)
# Multiple extensions
json_and_csv = glob.glob("/var/logs/**/*.json", recursive=True)
json_and_csv += glob.glob("/var/logs/**/*.csv", recursive=True)** with recursive=True is the recursive wildcard — it matches any number of path components. So /var/logs/**/*.log finds every .log file at any depth under /var/logs/. My thirty-line recursive function is one line.
The glob pattern language: * matches any characters except /. ** with recursive=True matches any path including subdirectory separators. ? matches any single character. [abc] matches one of those characters. [0-9] matches a digit. For log files organized by date — /var/logs/2026/04/*.log — you can target a specific month without touching other directories.
pathlib.Path.glob() is also an option, right? I remember seeing it mentioned. Is there a reason to use one over the other?
pathlib.Path.glob() returns Path objects directly — you get the pathlib attributes immediately without wrapping. glob.glob() returns strings. For new code where you're using pathlib throughout, Path.glob() is the natural choice:
from pathlib import Path
log_dir = Path("/var/logs/services")
# All .log files in this directory
log_files = list(log_dir.glob("*.log"))
# All .log files anywhere under this directory
all_logs = list(log_dir.rglob("*.log")) # rglob is recursive glob
for p in all_logs:
print(p.stem, p.parent) # pathlib attributes immediately availablerglob("*.log") — "recursive glob" — finds all .log files at any depth. And each result is a Path object, so .stem, .parent, .suffix all work directly. That's the pathlib integration from Week 1 closing the loop.
And fnmatch is the companion module for checking whether a filename matches a pattern, without searching the filesystem:
import fnmatch
filename = "auth-2026-04-07.log.gz"
print(fnmatch.fnmatch(filename, "*.log")) # False — doesn't end with .log
print(fnmatch.fnmatch(filename, "*.log.gz")) # True
print(fnmatch.fnmatch(filename, "auth-*")) # True
print(fnmatch.fnmatch(filename, "*-2026-*")) # True
# Filter a list
files = ["auth.log", "api.json", "db.log.gz", "cron.log"]
logs = fnmatch.filter(files, "*.log")
print(logs) # ['auth.log', 'cron.log']fnmatch.fnmatch() tests a single filename against a pattern. fnmatch.filter() filters a list. This is what I want when the ops team sends me a list of filenames and I need to categorize them by type — without doing filesystem access.
The patterns are case-insensitive on Windows and case-sensitive on Unix/Mac, matching the filesystem. For cross-platform code that needs case-insensitive matching on Unix: fnmatch.fnmatchcase(filename.lower(), pattern.lower()).
glob finds files in the actual filesystem. fnmatch checks patterns against strings. Two different problems, two different tools. And pathlib.rglob() is the third way that also gives me Path objects. I have three hammers and I'll pick the right one for the nail.
Tomorrow is the capstone. You've built functions for every stage of the log analyzer — parsing JSON logs, filtering by time window, counting with Counter, sorting by frequency. Tomorrow you wire them together with argparse and logging into a complete CLI tool. One week from now, the ops team runs it from cron with their own parameters. That's the goal.
File discovery by pattern is one of the most common automation tasks in operations work. Python provides two tools: glob for filesystem traversal with pattern matching, and fnmatch for testing individual filenames against patterns. The distinction is between finding (glob) and checking (fnmatch).
The glob module uses shell-style wildcards: * matches any characters except path separators, ? matches any single character, [seq] matches any character in the sequence. These are not regular expressions — there is no +, no \d, no |. The patterns are simpler and more readable for file paths.
glob.glob(pattern, recursive=False) returns a list of strings. With recursive=True, the ** pattern matches zero or more directory levels: "/var/logs/**/*.log" matches /var/logs/auth.log, /var/logs/services/auth.log, and /var/logs/services/auth/2026/auth.log.
Path.glob(pattern) returns a generator of Path objects. Path.rglob(pattern) is equivalent to Path.glob("**/" + pattern). For code that uses pathlib throughout, Path.glob() is the natural choice — no wrapping, Path attributes immediately available. For code that builds glob patterns as strings (from config files, CLI arguments), glob.glob() is more direct.
Performance: both use the same underlying OS APIs. For large directory trees, the generator form (Path.glob()) is more memory-efficient than glob.glob() which materializes all results.
fnmatch.fnmatch(filename, pattern) tests whether a single filename matches a pattern without any filesystem access. fnmatch.filter(names, pattern) applies a pattern to a list of names and returns the matches. These are the right tools for pattern-based filtering when you already have a list of filenames — from a database, a monitoring API, or a previous glob call.
fnmatch.translate(pattern) converts a glob pattern to a regular expression string, useful when you need to embed glob semantics in a larger regex pipeline.
Date-organized log directories (/var/logs/2026/04/07/*.log) can be targeted precisely with glob patterns. glob.glob("/var/logs/2026/04/**/*.log", recursive=True) fetches all April 2026 logs. glob.glob("/var/logs/2026/04/0[1-9]/*.log") fetches only the first nine days. This avoids listing the entire directory tree and filtering in Python — the OS does the filtering at the directory level, which is faster for deep trees.
Sign up to write and run code in this lesson.
The ops team's log directory now has files from six months — hundreds of files across nested subdirectories. How are you currently finding the right files to analyze?
I've been using os.listdir() and then filtering with if filename.endswith(".log"). For nested directories I wrote a recursive function with os.walk(). It works. It's also thirty lines.
This is where glob earns its spot in your toolkit. One function call finds files matching a pattern across an entire directory tree:
import glob
# All .log files in /var/logs/services/ (one level only)
log_files = glob.glob("/var/logs/services/*.log")
# All .log files anywhere under /var/logs/ (recursive)
all_logs = glob.glob("/var/logs/**/*.log", recursive=True)
# Multiple extensions
json_and_csv = glob.glob("/var/logs/**/*.json", recursive=True)
json_and_csv += glob.glob("/var/logs/**/*.csv", recursive=True)** with recursive=True is the recursive wildcard — it matches any number of path components. So /var/logs/**/*.log finds every .log file at any depth under /var/logs/. My thirty-line recursive function is one line.
The glob pattern language: * matches any characters except /. ** with recursive=True matches any path including subdirectory separators. ? matches any single character. [abc] matches one of those characters. [0-9] matches a digit. For log files organized by date — /var/logs/2026/04/*.log — you can target a specific month without touching other directories.
pathlib.Path.glob() is also an option, right? I remember seeing it mentioned. Is there a reason to use one over the other?
pathlib.Path.glob() returns Path objects directly — you get the pathlib attributes immediately without wrapping. glob.glob() returns strings. For new code where you're using pathlib throughout, Path.glob() is the natural choice:
from pathlib import Path
log_dir = Path("/var/logs/services")
# All .log files in this directory
log_files = list(log_dir.glob("*.log"))
# All .log files anywhere under this directory
all_logs = list(log_dir.rglob("*.log")) # rglob is recursive glob
for p in all_logs:
print(p.stem, p.parent) # pathlib attributes immediately availablerglob("*.log") — "recursive glob" — finds all .log files at any depth. And each result is a Path object, so .stem, .parent, .suffix all work directly. That's the pathlib integration from Week 1 closing the loop.
And fnmatch is the companion module for checking whether a filename matches a pattern, without searching the filesystem:
import fnmatch
filename = "auth-2026-04-07.log.gz"
print(fnmatch.fnmatch(filename, "*.log")) # False — doesn't end with .log
print(fnmatch.fnmatch(filename, "*.log.gz")) # True
print(fnmatch.fnmatch(filename, "auth-*")) # True
print(fnmatch.fnmatch(filename, "*-2026-*")) # True
# Filter a list
files = ["auth.log", "api.json", "db.log.gz", "cron.log"]
logs = fnmatch.filter(files, "*.log")
print(logs) # ['auth.log', 'cron.log']fnmatch.fnmatch() tests a single filename against a pattern. fnmatch.filter() filters a list. This is what I want when the ops team sends me a list of filenames and I need to categorize them by type — without doing filesystem access.
The patterns are case-insensitive on Windows and case-sensitive on Unix/Mac, matching the filesystem. For cross-platform code that needs case-insensitive matching on Unix: fnmatch.fnmatchcase(filename.lower(), pattern.lower()).
glob finds files in the actual filesystem. fnmatch checks patterns against strings. Two different problems, two different tools. And pathlib.rglob() is the third way that also gives me Path objects. I have three hammers and I'll pick the right one for the nail.
Tomorrow is the capstone. You've built functions for every stage of the log analyzer — parsing JSON logs, filtering by time window, counting with Counter, sorting by frequency. Tomorrow you wire them together with argparse and logging into a complete CLI tool. One week from now, the ops team runs it from cron with their own parameters. That's the goal.
File discovery by pattern is one of the most common automation tasks in operations work. Python provides two tools: glob for filesystem traversal with pattern matching, and fnmatch for testing individual filenames against patterns. The distinction is between finding (glob) and checking (fnmatch).
The glob module uses shell-style wildcards: * matches any characters except path separators, ? matches any single character, [seq] matches any character in the sequence. These are not regular expressions — there is no +, no \d, no |. The patterns are simpler and more readable for file paths.
glob.glob(pattern, recursive=False) returns a list of strings. With recursive=True, the ** pattern matches zero or more directory levels: "/var/logs/**/*.log" matches /var/logs/auth.log, /var/logs/services/auth.log, and /var/logs/services/auth/2026/auth.log.
Path.glob(pattern) returns a generator of Path objects. Path.rglob(pattern) is equivalent to Path.glob("**/" + pattern). For code that uses pathlib throughout, Path.glob() is the natural choice — no wrapping, Path attributes immediately available. For code that builds glob patterns as strings (from config files, CLI arguments), glob.glob() is more direct.
Performance: both use the same underlying OS APIs. For large directory trees, the generator form (Path.glob()) is more memory-efficient than glob.glob() which materializes all results.
fnmatch.fnmatch(filename, pattern) tests whether a single filename matches a pattern without any filesystem access. fnmatch.filter(names, pattern) applies a pattern to a list of names and returns the matches. These are the right tools for pattern-based filtering when you already have a list of filenames — from a database, a monitoring API, or a previous glob call.
fnmatch.translate(pattern) converts a glob pattern to a regular expression string, useful when you need to embed glob semantics in a larger regex pipeline.
Date-organized log directories (/var/logs/2026/04/07/*.log) can be targeted precisely with glob patterns. glob.glob("/var/logs/2026/04/**/*.log", recursive=True) fetches all April 2026 logs. glob.glob("/var/logs/2026/04/0[1-9]/*.log") fetches only the first nine days. This avoids listing the entire directory tree and filtering in Python — the OS does the filtering at the directory level, which is faster for deep trees.