You fixed the crash in the enrichment service last night. Forty minutes start to finish.
The traceback showed a KeyError. I stared at the exception line for ten minutes before I remembered: the bottom is the symptom. I read the frames top to bottom instead, and the cause was two frames up.
That is the shift. The bug is the same. The time to find it is not. Walk me through the reading protocol on a concrete example:
Traceback (most recent call last):
File "pipeline.py", line 47, in run_pipeline
results = process_batch(records)
File "pipeline.py", line 31, in process_batch
return [format_record(r) for r in records]
File "pipeline.py", line 18, in format_record
return {"id": record["record-id"], "value": record["amount"]}
KeyError: 'record-id'Frame 1: run_pipeline called process_batch. Frame 2: process_batch called format_record for each record. Frame 3: format_record tried to access record["record-id"] and that key did not exist. Exception: the specific missing key.
And where is the bug?
Not in format_record. The function expects record-id to exist. The bug is in whatever created the record — different key name. The exception shows where the assumption broke. The frames above show where the wrong assumption was made.
That is the rule. Now — chained exceptions. Two blocks connected by "During handling of the above exception":
Traceback (most recent call last):
File "pipeline.py", line 22, in load_config
data = json.loads(raw)
json.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pipeline.py", line 25, in load_config
raise ConfigError(f"Failed to load from {path}")
ConfigError: Failed to load from /etc/app/config.jsonI have always read only the second block. The ConfigError is what crashed the program. The JSONDecodeError looks like noise.
The JSONDecodeError is the root cause. The config file was empty. Without it, your operations team sees "config failed to load," restarts the service, and it fails again. raise X from Y preserves both:
try:
data = json.loads(raw)
except json.JSONDecodeError as e:
raise ConfigError(f"Failed to load from {path}") from eI have been using bare raise NewError(message) everywhere. Every crash report in our error tracker is missing the root cause because I stripped it away.
Mechanical fix: add from e to every re-raise where the original error carries diagnostic value. Use from None only when the internal error is an implementation detail callers should not see — not to clean up a messy traceback. Messy tracebacks carry information.
Reading protocol, complete: exception type identifies the category. Frames top to bottom are a timeline — entry point at top, crash at bottom. Find the frame where bad data was created, not just consumed. In a chained traceback, read the first block as cause and the second as effect.
The traceback module makes this programmatic:
import traceback
try:
process_batch(records)
except Exception:
frames = traceback.extract_tb(e.__traceback__)
for frame in frames:
print(f"{frame.filename}:{frame.lineno} in {frame.name}")extract_tb returns FrameSummary objects — filename, line number, function name, source text — structured for querying, not just printing.
Frames as objects with attributes means I can write the reading protocol as code. Find the outermost frame as origin, count depth, detect chains. A traceback parser.
That is today's challenge. parse_traceback takes a traceback string and returns structured data: exception type, message, frames in order, depth, origin as filename:lineno, and whether the exception is chained. Tomorrow you replace the print statements with logging that carries the same precision — module path, severity, and context.
The week arc: pdb for real-time state, traceback reading for the timeline, logging for ambient context, and then inspect and dis to see what Python actually compiled into bytecode.
Each tool answers a more fundamental question. Real-time, retrospective, ambient, foundational. The same question at four different distances from the running code.
I want to build the parser correctly. from e on every re-raise. Read both blocks of chained tracebacks. And write a function that does the reading protocol automatically instead of by eye.
Build it. When the next incident happens, run the parser first. You will have the briefing before you open the terminal — origin, depth, chain direction, exception type. The four things that tell you what kind of problem it is before you start reading line by line.
How Python captures tracebacks. When an exception is raised, CPython creates a traceback object (PyTracebackObject) that chains together frame objects in the order they were active when the exception propagated. Each entry records the frame and the line number at the time. value.__traceback__ holds the chain; sys.last_traceback stores the traceback of the most recent unhandled exception — which is what pdb.pm() uses for post-mortem debugging.
traceback.extract_tb and FrameSummary. traceback.extract_tb(tb) walks the traceback chain and returns a StackSummary — a list of FrameSummary objects. Each has .filename, .lineno, .name, and .line (source text, loaded on demand from the file). The .line attribute triggers a file read the first time it is accessed, which is why extract_tb can fail on frames from dynamically generated code where no source file exists.
Exception chaining semantics. raise X from Y sets X.__cause__ = Y and X.__suppress_context__ = True, producing the "direct cause" message. Plain raise X inside an except block sets X.__context__ = Y without __suppress_context__, producing the "During handling" message. raise X from None sets __suppress_context__ = True with __cause__ = None, hiding both messages. traceback.format_exc() follows these attributes to decide how many blocks to print.
The reading protocol as code. The five-step protocol maps directly to the traceback API: type(e).__name__ is the exception type; str(e) is the message; extract_tb(e.__traceback__) is the frame list, with the first frame as entry point and the last as crash site; e.__cause__ is not None or e.__context__ is not None indicates a chain. A parser that surfaces these five fields gives a structured briefing before any human reads the raw traceback.
Sign up to write and run code in this lesson.
You fixed the crash in the enrichment service last night. Forty minutes start to finish.
The traceback showed a KeyError. I stared at the exception line for ten minutes before I remembered: the bottom is the symptom. I read the frames top to bottom instead, and the cause was two frames up.
That is the shift. The bug is the same. The time to find it is not. Walk me through the reading protocol on a concrete example:
Traceback (most recent call last):
File "pipeline.py", line 47, in run_pipeline
results = process_batch(records)
File "pipeline.py", line 31, in process_batch
return [format_record(r) for r in records]
File "pipeline.py", line 18, in format_record
return {"id": record["record-id"], "value": record["amount"]}
KeyError: 'record-id'Frame 1: run_pipeline called process_batch. Frame 2: process_batch called format_record for each record. Frame 3: format_record tried to access record["record-id"] and that key did not exist. Exception: the specific missing key.
And where is the bug?
Not in format_record. The function expects record-id to exist. The bug is in whatever created the record — different key name. The exception shows where the assumption broke. The frames above show where the wrong assumption was made.
That is the rule. Now — chained exceptions. Two blocks connected by "During handling of the above exception":
Traceback (most recent call last):
File "pipeline.py", line 22, in load_config
data = json.loads(raw)
json.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "pipeline.py", line 25, in load_config
raise ConfigError(f"Failed to load from {path}")
ConfigError: Failed to load from /etc/app/config.jsonI have always read only the second block. The ConfigError is what crashed the program. The JSONDecodeError looks like noise.
The JSONDecodeError is the root cause. The config file was empty. Without it, your operations team sees "config failed to load," restarts the service, and it fails again. raise X from Y preserves both:
try:
data = json.loads(raw)
except json.JSONDecodeError as e:
raise ConfigError(f"Failed to load from {path}") from eI have been using bare raise NewError(message) everywhere. Every crash report in our error tracker is missing the root cause because I stripped it away.
Mechanical fix: add from e to every re-raise where the original error carries diagnostic value. Use from None only when the internal error is an implementation detail callers should not see — not to clean up a messy traceback. Messy tracebacks carry information.
Reading protocol, complete: exception type identifies the category. Frames top to bottom are a timeline — entry point at top, crash at bottom. Find the frame where bad data was created, not just consumed. In a chained traceback, read the first block as cause and the second as effect.
The traceback module makes this programmatic:
import traceback
try:
process_batch(records)
except Exception:
frames = traceback.extract_tb(e.__traceback__)
for frame in frames:
print(f"{frame.filename}:{frame.lineno} in {frame.name}")extract_tb returns FrameSummary objects — filename, line number, function name, source text — structured for querying, not just printing.
Frames as objects with attributes means I can write the reading protocol as code. Find the outermost frame as origin, count depth, detect chains. A traceback parser.
That is today's challenge. parse_traceback takes a traceback string and returns structured data: exception type, message, frames in order, depth, origin as filename:lineno, and whether the exception is chained. Tomorrow you replace the print statements with logging that carries the same precision — module path, severity, and context.
The week arc: pdb for real-time state, traceback reading for the timeline, logging for ambient context, and then inspect and dis to see what Python actually compiled into bytecode.
Each tool answers a more fundamental question. Real-time, retrospective, ambient, foundational. The same question at four different distances from the running code.
I want to build the parser correctly. from e on every re-raise. Read both blocks of chained tracebacks. And write a function that does the reading protocol automatically instead of by eye.
Build it. When the next incident happens, run the parser first. You will have the briefing before you open the terminal — origin, depth, chain direction, exception type. The four things that tell you what kind of problem it is before you start reading line by line.
How Python captures tracebacks. When an exception is raised, CPython creates a traceback object (PyTracebackObject) that chains together frame objects in the order they were active when the exception propagated. Each entry records the frame and the line number at the time. value.__traceback__ holds the chain; sys.last_traceback stores the traceback of the most recent unhandled exception — which is what pdb.pm() uses for post-mortem debugging.
traceback.extract_tb and FrameSummary. traceback.extract_tb(tb) walks the traceback chain and returns a StackSummary — a list of FrameSummary objects. Each has .filename, .lineno, .name, and .line (source text, loaded on demand from the file). The .line attribute triggers a file read the first time it is accessed, which is why extract_tb can fail on frames from dynamically generated code where no source file exists.
Exception chaining semantics. raise X from Y sets X.__cause__ = Y and X.__suppress_context__ = True, producing the "direct cause" message. Plain raise X inside an except block sets X.__context__ = Y without __suppress_context__, producing the "During handling" message. raise X from None sets __suppress_context__ = True with __cause__ = None, hiding both messages. traceback.format_exc() follows these attributes to decide how many blocks to print.
The reading protocol as code. The five-step protocol maps directly to the traceback API: type(e).__name__ is the exception type; str(e) is the message; extract_tb(e.__traceback__) is the frame list, with the first frame as entry point and the last as crash site; e.__cause__ is not None or e.__context__ is not None indicates a chain. A parser that surfaces these five fields gives a structured briefing before any human reads the raw traceback.