The ops team's log directory is filling up. They want files over 100 MB archived before the disk fills up. Walk me through how you'd currently copy a file.
Open the source in 'rb' mode, open the destination in 'wb' mode, read chunks, write chunks, close both, then os.remove() the source. I've done it before. It's about twenty lines.
You've done it before and you'll never need to do it again. Here's what Python ships with:
import shutil
# Copy a file — preserves content, not metadata
shutil.copy("auth.log", "/archive/2026-03/auth.log")
# Copy a file AND its metadata (timestamps, permissions)
shutil.copy2("auth.log", "/archive/2026-03/auth.log")
# Move a file — works across filesystems
shutil.move("auth.log", "/archive/2026-03/auth.log")
# Delete an entire directory tree
shutil.rmtree("/archive/2026-02/")shutil.move() works across filesystems? I've been burned by os.rename() throwing a cross-device error when source and destination are on different drives. I thought file moving was always that annoying.
os.rename() is a single syscall — it only updates the directory entry and cannot cross filesystem boundaries. shutil.move() tries os.rename() first, and if that fails it falls back to copy-then-delete automatically. You get the right behavior without thinking about it.
So shutil is the layer that does the right thing by default. os is the direct line to the OS — fast but strict about edge cases. shutil is the wrapper that handles them.
Exact description. The analogy I use: shutil is a moving company. You tell them what to move. They handle the logistics. And for disk space:
import shutil, os
usage = shutil.disk_usage("/var/logs/services/")
print(f"{usage.free / (1024**2):.1f} MB free")
# Per-file size
size = os.path.getsize("/var/logs/services/auth.log")
print(f"{size / (1024**2):.1f} MB")So my rotation function needs: get the size of each log file, decide which ones exceed the max threshold, then plan what moves where. But today's problem is just the planning step — figuring out what would happen without touching the filesystem. No actual shutil.move() calls yet.
Right. And that separation is good design — plan, then execute. Let me show you the full archive pattern so you can see where this is heading:
import shutil, os
def archive_large_logs(log_dir, archive_dir, max_size_mb):
max_bytes = max_size_mb * 1024 * 1024
for filename in os.listdir(log_dir):
filepath = os.path.join(log_dir, filename)
if os.path.getsize(filepath) > max_bytes:
shutil.move(filepath, os.path.join(archive_dir, filename))
print(f"Archived {filename}")That is so much cleaner than what I had in my head. I was about to write eighty lines with chunk-reading loops. This is twelve. Sam, where has this been?
In the standard library. Since Python 2.4. Now — before the problem: shutil.make_archive() compresses a directory tree into a zip with one call. The ops lead also wants older logs in cold storage:
shutil.make_archive(
base_name="/archive/logs-march-2026", # output path without extension
format="zip",
root_dir="/var/logs/",
base_dir="old/"
)Today's problem: files above max_size_mb go to to_archive, files below stay in to_keep. Track total size, archive size, archive count. Five keys, pure planning logic, no filesystem I/O.
You've built this kind of function before — in Track 2, the inventory system computed cost summaries before writing the report. Same structure: iterate a list of dicts, accumulate into buckets, return the summary. The shutil methods are for executing the plan. Today you build the plan.
What's Week 2 about?
Text patterns. The ops team's older logs — before they switched to structured JSON — are raw strings. Timestamp, level, message, all jammed together with no consistent delimiter. To extract anything useful, you need to describe the shape of the pattern you're looking for. We're going to talk about regular expressions.
I've seen regex before. In a Stack Overflow answer. It looked like someone sneezed on the keyboard.
That is the universal first reaction. By the end of Week 2, you'll be writing those Stack Overflow answers. See you Monday.
shutil (shell utilities) exists because copying, moving, and archiving files correctly involves more than a single system call. Edge cases — filesystem boundaries, permission bits, metadata preservation, atomic moves — accumulate quickly. shutil handles them so your code doesn't have to.
shutil.copyfile(src, dst) copies only the file's data — no metadata, no permissions. shutil.copy(src, dst) copies data and permissions bits. shutil.copy2(src, dst) copies data, permissions, and timestamps. For log archiving where audit trails matter, copy2 preserves the original modification time. For generating derivative files where the copy timestamp is correct, copy or copyfile is appropriate.
os.rename() is a direct wrapper around the OS rename() syscall, which requires that source and destination be on the same filesystem. On Linux and macOS, moving a file from /var/log/ (on the root filesystem) to /mnt/archive/ (a mounted NAS) fails with OSError: [Errno 18] Invalid cross-device link. shutil.move() handles this transparently: it tries os.rename() first, catches the cross-device error, and falls back to shutil.copy2() followed by os.remove(). The result is correct behavior on every filesystem combination.
shutil.copytree(src, dst) recursively copies an entire directory tree. shutil.rmtree(path) recursively deletes one. These are the operations you'd do in Finder or Explorer but from code — important for log rotation, deployment cleanup, and test fixture setup. Both accept a ignore parameter (use shutil.ignore_patterns("*.pyc", "__pycache__")) to skip certain files.
shutil.make_archive() creates zip, tar, gztar, bztar, or xztar archives from a directory tree in one call. shutil.unpack_archive() extracts them. Together they handle the cold storage workflow — compress and ship — without requiring the zipfile or tarfile modules for the common case.
shutil.disk_usage(path) returns a named tuple with total, used, and free in bytes for the filesystem containing path. This is the correct check before archiving: confirm there is space at the destination before moving large files. Divide by 1024**2 for megabytes, 1024**3 for gigabytes.
Sign up to write and run code in this lesson.
The ops team's log directory is filling up. They want files over 100 MB archived before the disk fills up. Walk me through how you'd currently copy a file.
Open the source in 'rb' mode, open the destination in 'wb' mode, read chunks, write chunks, close both, then os.remove() the source. I've done it before. It's about twenty lines.
You've done it before and you'll never need to do it again. Here's what Python ships with:
import shutil
# Copy a file — preserves content, not metadata
shutil.copy("auth.log", "/archive/2026-03/auth.log")
# Copy a file AND its metadata (timestamps, permissions)
shutil.copy2("auth.log", "/archive/2026-03/auth.log")
# Move a file — works across filesystems
shutil.move("auth.log", "/archive/2026-03/auth.log")
# Delete an entire directory tree
shutil.rmtree("/archive/2026-02/")shutil.move() works across filesystems? I've been burned by os.rename() throwing a cross-device error when source and destination are on different drives. I thought file moving was always that annoying.
os.rename() is a single syscall — it only updates the directory entry and cannot cross filesystem boundaries. shutil.move() tries os.rename() first, and if that fails it falls back to copy-then-delete automatically. You get the right behavior without thinking about it.
So shutil is the layer that does the right thing by default. os is the direct line to the OS — fast but strict about edge cases. shutil is the wrapper that handles them.
Exact description. The analogy I use: shutil is a moving company. You tell them what to move. They handle the logistics. And for disk space:
import shutil, os
usage = shutil.disk_usage("/var/logs/services/")
print(f"{usage.free / (1024**2):.1f} MB free")
# Per-file size
size = os.path.getsize("/var/logs/services/auth.log")
print(f"{size / (1024**2):.1f} MB")So my rotation function needs: get the size of each log file, decide which ones exceed the max threshold, then plan what moves where. But today's problem is just the planning step — figuring out what would happen without touching the filesystem. No actual shutil.move() calls yet.
Right. And that separation is good design — plan, then execute. Let me show you the full archive pattern so you can see where this is heading:
import shutil, os
def archive_large_logs(log_dir, archive_dir, max_size_mb):
max_bytes = max_size_mb * 1024 * 1024
for filename in os.listdir(log_dir):
filepath = os.path.join(log_dir, filename)
if os.path.getsize(filepath) > max_bytes:
shutil.move(filepath, os.path.join(archive_dir, filename))
print(f"Archived {filename}")That is so much cleaner than what I had in my head. I was about to write eighty lines with chunk-reading loops. This is twelve. Sam, where has this been?
In the standard library. Since Python 2.4. Now — before the problem: shutil.make_archive() compresses a directory tree into a zip with one call. The ops lead also wants older logs in cold storage:
shutil.make_archive(
base_name="/archive/logs-march-2026", # output path without extension
format="zip",
root_dir="/var/logs/",
base_dir="old/"
)Today's problem: files above max_size_mb go to to_archive, files below stay in to_keep. Track total size, archive size, archive count. Five keys, pure planning logic, no filesystem I/O.
You've built this kind of function before — in Track 2, the inventory system computed cost summaries before writing the report. Same structure: iterate a list of dicts, accumulate into buckets, return the summary. The shutil methods are for executing the plan. Today you build the plan.
What's Week 2 about?
Text patterns. The ops team's older logs — before they switched to structured JSON — are raw strings. Timestamp, level, message, all jammed together with no consistent delimiter. To extract anything useful, you need to describe the shape of the pattern you're looking for. We're going to talk about regular expressions.
I've seen regex before. In a Stack Overflow answer. It looked like someone sneezed on the keyboard.
That is the universal first reaction. By the end of Week 2, you'll be writing those Stack Overflow answers. See you Monday.
shutil (shell utilities) exists because copying, moving, and archiving files correctly involves more than a single system call. Edge cases — filesystem boundaries, permission bits, metadata preservation, atomic moves — accumulate quickly. shutil handles them so your code doesn't have to.
shutil.copyfile(src, dst) copies only the file's data — no metadata, no permissions. shutil.copy(src, dst) copies data and permissions bits. shutil.copy2(src, dst) copies data, permissions, and timestamps. For log archiving where audit trails matter, copy2 preserves the original modification time. For generating derivative files where the copy timestamp is correct, copy or copyfile is appropriate.
os.rename() is a direct wrapper around the OS rename() syscall, which requires that source and destination be on the same filesystem. On Linux and macOS, moving a file from /var/log/ (on the root filesystem) to /mnt/archive/ (a mounted NAS) fails with OSError: [Errno 18] Invalid cross-device link. shutil.move() handles this transparently: it tries os.rename() first, catches the cross-device error, and falls back to shutil.copy2() followed by os.remove(). The result is correct behavior on every filesystem combination.
shutil.copytree(src, dst) recursively copies an entire directory tree. shutil.rmtree(path) recursively deletes one. These are the operations you'd do in Finder or Explorer but from code — important for log rotation, deployment cleanup, and test fixture setup. Both accept a ignore parameter (use shutil.ignore_patterns("*.pyc", "__pycache__")) to skip certain files.
shutil.make_archive() creates zip, tar, gztar, bztar, or xztar archives from a directory tree in one call. shutil.unpack_archive() extracts them. Together they handle the cold storage workflow — compress and ship — without requiring the zipfile or tarfile modules for the common case.
shutil.disk_usage(path) returns a named tuple with total, used, and free in bytes for the filesystem containing path. This is the correct check before archiving: confirm there is space at the destination before moving large files. Divide by 1024**2 for megabytes, 1024**3 for gigabytes.