Yesterday: predicting idempotency. Today: making a non-idempotent loop idempotent. The pattern — track which keys you've already processed, skip the seen ones.
seen = set()
items = ["a", "b", "c"]
processed = []
for item in items:
if item in seen:
continue
seen.add(item)
processed.append(item)
# now imagine a 're-run' with the same set
for item in items:
if item in seen:
continue
seen.add(item)
processed.append(item)
print(processed) # ['a', 'b', 'c'] — the second pass added nothingRun the loop twice with the same seen set. Second pass: zero new work.
But once the script ends, seen vanishes — so re-runs across script invocations would still duplicate?
Right. In-memory dedup works within one run. Tomorrow's lesson promotes the set to a real state store — a Google Sheet — so the dedup key survives the script ending. Today: the in-memory shape, on a tiny generic input.
Why not just process each item once?
Real lists have duplicates — same email arriving via two threads, same row appearing in two API pages, retry firing twice on the same item after a network blip. The dedup is what guarantees the side effect happens once per logical key, regardless of how many times the loop sees it.
setseen = set()
for item in items:
key = key_of(item)
if key in seen:
continue # already processed — skip
seen.add(key)
side_effect(item) # do the workThree lines. Once your eye gets used to them, you'll write them automatically before any write-loop.
The key must be stable — same logical thing → same key, every time.
| Source | Good key |
|---|---|
| Gmail message | msg["id"] (server-assigned, stable forever) |
| Calendar event | evt["id"] |
| Sheet row | the value in a unique column |
| Composed payload | hash of the relevant fields |
Don't use timestamps as keys — two items at the same second collide. Don't use mutable fields (subject, status) — they change.
set() operationsThree primitives cover most usage:
seen = set() # empty set
seen.add("a") # add — duplicates silently no-op
"a" in seen # membership — fast (O(1))
len(seen) # current sizeWithin one run — today's pattern. seen = set() at the top of the script.
Across runs — tomorrow's lesson. seen = read_state_from_sheet() at the top, write_state_to_sheet(seen) at the end. The set is the same shape; only its lifecycle changes.
A chain that uses dedup is idempotent at the loop level:
The individual side effects (send_email, append_row) might still be non-idempotent at the API level, but the script is idempotent because it never calls them on a key it's seen before.
Yesterday: predicting idempotency. Today: making a non-idempotent loop idempotent. The pattern — track which keys you've already processed, skip the seen ones.
seen = set()
items = ["a", "b", "c"]
processed = []
for item in items:
if item in seen:
continue
seen.add(item)
processed.append(item)
# now imagine a 're-run' with the same set
for item in items:
if item in seen:
continue
seen.add(item)
processed.append(item)
print(processed) # ['a', 'b', 'c'] — the second pass added nothingRun the loop twice with the same seen set. Second pass: zero new work.
But once the script ends, seen vanishes — so re-runs across script invocations would still duplicate?
Right. In-memory dedup works within one run. Tomorrow's lesson promotes the set to a real state store — a Google Sheet — so the dedup key survives the script ending. Today: the in-memory shape, on a tiny generic input.
Why not just process each item once?
Real lists have duplicates — same email arriving via two threads, same row appearing in two API pages, retry firing twice on the same item after a network blip. The dedup is what guarantees the side effect happens once per logical key, regardless of how many times the loop sees it.
setseen = set()
for item in items:
key = key_of(item)
if key in seen:
continue # already processed — skip
seen.add(key)
side_effect(item) # do the workThree lines. Once your eye gets used to them, you'll write them automatically before any write-loop.
The key must be stable — same logical thing → same key, every time.
| Source | Good key |
|---|---|
| Gmail message | msg["id"] (server-assigned, stable forever) |
| Calendar event | evt["id"] |
| Sheet row | the value in a unique column |
| Composed payload | hash of the relevant fields |
Don't use timestamps as keys — two items at the same second collide. Don't use mutable fields (subject, status) — they change.
set() operationsThree primitives cover most usage:
seen = set() # empty set
seen.add("a") # add — duplicates silently no-op
"a" in seen # membership — fast (O(1))
len(seen) # current sizeWithin one run — today's pattern. seen = set() at the top of the script.
Across runs — tomorrow's lesson. seen = read_state_from_sheet() at the top, write_state_to_sheet(seen) at the end. The set is the same shape; only its lifecycle changes.
A chain that uses dedup is idempotent at the loop level:
The individual side effects (send_email, append_row) might still be non-idempotent at the API level, but the script is idempotent because it never calls them on a key it's seen before.
Create a free account to get started. Paid plans unlock all tracks.