Webhook providers retry. If your endpoint returns 500, or your network drops the response, the provider sends the same event again — possibly multiple times. Without idempotency you'd process every retry as if it were new.
processed = set()
events = [
{"id": "e_001", "type": "item.created"},
{"id": "e_002", "type": "item.created"},
{"id": "e_001", "type": "item.created"}, # retry of e_001
{"id": "e_003", "type": "item.created"},
]
results = []
for e in events:
eid = e["id"]
if eid in processed:
results.append("skip")
continue
processed.add(eid)
results.append("process")
print(results)Expected: ['process', 'process', 'skip', 'process']. The retry of e_001 is detected and skipped.
This is just dedup with a different name?
Same shape. The different name matters because providers explicitly guarantee a unique event.id per event — they're telling you "use this for idempotency". You don't have to invent a key; just record the IDs you've processed.
What about across script restarts?
In-memory set survives only this run. Production webhook handlers persist processed in a Sheet, a database, or a Redis. Next lesson combines it with state writing. For today: get the in-memory shape right.
Webhook delivery semantics are at-least-once. Providers retry until they get a 2xx response, with backoff. Stripe retries for up to 3 days. GitHub gives up after a few attempts. Custom providers vary.
The upshot: every webhook handler will, eventually, receive the same event twice. If your handler isn't idempotent, every retry duplicates the side effect.
processed_ids = set() # or load from persistent store at startup
def handle(event):
eid = event["id"]
if eid in processed_ids:
log_skip(eid)
return # already processed — return 200 to provider
process(event)
processed_ids.add(eid) # mark only after successful processKey rule: mark as processed only after successful processing. If you mark first and crash mid-process, the retry will skip the event and you'll have a partial side effect.
# wrong — marks before doing the work
processed_ids.add(eid)
process(event) # if this raises, retry will skip — silent data loss
# right — mark after the work succeeds
process(event)
processed_ids.add(eid)process is itself non-idempotent?Processing usually involves side effects — sending email, writing rows. If process succeeds halfway and crashes, marking after means the retry will redo the whole thing. There are three resolutions:
process itself idempotent — use idempotency keys on writes (e.g., iCalUID on Calendar create). Then re-running is safe.Production systems often combine (1) and (2). The webhook handler is idempotent at the event level and at the step level inside process.
Providers document the field that's unique per event:
event.idX-GitHub-Delivery headerevent_id in the payloadevent.id or request_idAlways use the provider's documented unique key. Generated keys (timestamps, hashes of the body) work but rely on your assumptions about the provider's behavior.
For production, the set becomes a database lookup. Common shapes:
INSERT ... ON CONFLICT (event_id) DO NOTHING — atomic dedupSET NX with TTL — fast dedupThe in-memory set is the right pedagogical shape — same logic, different store.
Webhook providers retry. If your endpoint returns 500, or your network drops the response, the provider sends the same event again — possibly multiple times. Without idempotency you'd process every retry as if it were new.
processed = set()
events = [
{"id": "e_001", "type": "item.created"},
{"id": "e_002", "type": "item.created"},
{"id": "e_001", "type": "item.created"}, # retry of e_001
{"id": "e_003", "type": "item.created"},
]
results = []
for e in events:
eid = e["id"]
if eid in processed:
results.append("skip")
continue
processed.add(eid)
results.append("process")
print(results)Expected: ['process', 'process', 'skip', 'process']. The retry of e_001 is detected and skipped.
This is just dedup with a different name?
Same shape. The different name matters because providers explicitly guarantee a unique event.id per event — they're telling you "use this for idempotency". You don't have to invent a key; just record the IDs you've processed.
What about across script restarts?
In-memory set survives only this run. Production webhook handlers persist processed in a Sheet, a database, or a Redis. Next lesson combines it with state writing. For today: get the in-memory shape right.
Webhook delivery semantics are at-least-once. Providers retry until they get a 2xx response, with backoff. Stripe retries for up to 3 days. GitHub gives up after a few attempts. Custom providers vary.
The upshot: every webhook handler will, eventually, receive the same event twice. If your handler isn't idempotent, every retry duplicates the side effect.
processed_ids = set() # or load from persistent store at startup
def handle(event):
eid = event["id"]
if eid in processed_ids:
log_skip(eid)
return # already processed — return 200 to provider
process(event)
processed_ids.add(eid) # mark only after successful processKey rule: mark as processed only after successful processing. If you mark first and crash mid-process, the retry will skip the event and you'll have a partial side effect.
# wrong — marks before doing the work
processed_ids.add(eid)
process(event) # if this raises, retry will skip — silent data loss
# right — mark after the work succeeds
process(event)
processed_ids.add(eid)process is itself non-idempotent?Processing usually involves side effects — sending email, writing rows. If process succeeds halfway and crashes, marking after means the retry will redo the whole thing. There are three resolutions:
process itself idempotent — use idempotency keys on writes (e.g., iCalUID on Calendar create). Then re-running is safe.Production systems often combine (1) and (2). The webhook handler is idempotent at the event level and at the step level inside process.
Providers document the field that's unique per event:
event.idX-GitHub-Delivery headerevent_id in the payloadevent.id or request_idAlways use the provider's documented unique key. Generated keys (timestamps, hashes of the body) work but rely on your assumptions about the provider's behavior.
For production, the set becomes a database lookup. Common shapes:
INSERT ... ON CONFLICT (event_id) DO NOTHING — atomic dedupSET NX with TTL — fast dedupThe in-memory set is the right pedagogical shape — same logic, different store.
Create a free account to get started. Paid plans unlock all tracks.