A five-step pipeline crashes at step 3. You need to know which steps finished so the next run can skip them. What data structure records that?
A list of [name, success_flag] pairs? One entry per step with whether it worked?
Exactly. Lists of [str, bool] pairs serialize cleanly to JSON — critical because the checkpoint lives in a file between runs. Each step runs, its outcome gets recorded, the list grows:
checkpoints = []
for step in steps:
ok = run_step(step)
checkpoints.append([step, ok])So the whole pipeline boils down to: for each step, run it, record the result. The checkpoint list is the trace.
Right. For this lesson the step runner is simulated — every step succeeds — so we can focus on the data shape. The function builds the checkpoint list and returns it:
def checkpoint_steps(steps: list) -> list:
checkpoints = []
for step in steps:
checkpoints.append([step, True])
print(f"Checkpointed: {step}")
print(f"All {len(checkpoints)} steps recorded")
return checkpointsWhy lists [name, True] instead of tuples (name, True)? Tuples feel more natural for fixed pairs.
JSON has arrays but no tuples. Writing the checkpoint to disk and reading it back turns tuples into lists anyway. Using lists from the start keeps the in-memory and on-disk shapes identical — json.dumps(checkpoints) round-trips exactly.
So tomorrow I can reload this list after a crash and skip every step whose flag is True?
Exactly what Day 26 builds. Checkpoints are the memory your pipeline uses to survive restarts.
TL;DR: A list of [step_name, success_flag] pairs — JSON-safe, trivially serializable, and resume-ready the moment you read it back.
| Shape | Survives JSON round-trip? |
|---|---|
("step", True) | becomes ["step", true] |
["step", True] | stays identical |
A five-step pipeline crashes at step 3. You need to know which steps finished so the next run can skip them. What data structure records that?
A list of [name, success_flag] pairs? One entry per step with whether it worked?
Exactly. Lists of [str, bool] pairs serialize cleanly to JSON — critical because the checkpoint lives in a file between runs. Each step runs, its outcome gets recorded, the list grows:
checkpoints = []
for step in steps:
ok = run_step(step)
checkpoints.append([step, ok])So the whole pipeline boils down to: for each step, run it, record the result. The checkpoint list is the trace.
Right. For this lesson the step runner is simulated — every step succeeds — so we can focus on the data shape. The function builds the checkpoint list and returns it:
def checkpoint_steps(steps: list) -> list:
checkpoints = []
for step in steps:
checkpoints.append([step, True])
print(f"Checkpointed: {step}")
print(f"All {len(checkpoints)} steps recorded")
return checkpointsWhy lists [name, True] instead of tuples (name, True)? Tuples feel more natural for fixed pairs.
JSON has arrays but no tuples. Writing the checkpoint to disk and reading it back turns tuples into lists anyway. Using lists from the start keeps the in-memory and on-disk shapes identical — json.dumps(checkpoints) round-trips exactly.
So tomorrow I can reload this list after a crash and skip every step whose flag is True?
Exactly what Day 26 builds. Checkpoints are the memory your pipeline uses to survive restarts.
TL;DR: A list of [step_name, success_flag] pairs — JSON-safe, trivially serializable, and resume-ready the moment you read it back.
| Shape | Survives JSON round-trip? |
|---|---|
("step", True) | becomes ["step", true] |
["step", True] | stays identical |
Create a free account to get started. Paid plans unlock all tracks.