Yesterday you processed a queue of items, all succeeding. In production some items always fail — malformed input, account suspended, downstream API permanently 4xx-ing. Two bad answers:
The right answer is a dead-letter queue (DLQ): a separate list where unprocessable items go. The main queue keeps moving; you investigate the DLQ later.
from collections import deque
queue = deque([
{"id": "a", "value": 1},
{"id": "b", "value": -1}, # invalid — will fail
{"id": "c", "value": 3},
{"id": "d", "value": -2}, # invalid — will fail
{"id": "e", "value": 5},
])
processed = []
dead_letter = []
def process(item):
if item["value"] < 0:
raise ValueError(f"negative value: {item['value']}")
return item["value"] * 2
MAX_RETRIES = 2
for item in queue:
last_error = None
for attempt in range(MAX_RETRIES + 1):
try:
result = process(item)
processed.append({"id": item["id"], "result": result})
break
except Exception as e:
last_error = e
else:
dead_letter.append({"id": item["id"], "error": str(last_error)})
print(f"processed: {len(processed)}, dead-letter: {len(dead_letter)}")Expected: 3 processed (a, c, e), 2 dead-letter (b, d).
Why a for/else instead of an if?
for/else runs the else only when the loop didn't hit break. It's a clean Python idiom for "if no attempt succeeded, fall through here". Without it you'd track a succeeded boolean.
And what do you do with the dead-letter items later?
Read them. Manually if it's a dozen; with a separate consumer if it's many. The DLQ is triage, not a black hole — items there are waiting for a human to look at them. In production the DLQ is often a Sheet, a database table, or a real queue with longer retention.
Any non-trivial queue will eventually face poison messages — items that fail every retry. Without a DLQ:
Dead-lettering separates successfully processed from unprocessable, lets the queue keep flowing, preserves the bad items for later analysis.
for item in queue:
last_error = None
for attempt in range(MAX_RETRIES + 1):
try:
process(item)
break # success — exit retry loop
except Exception as e:
last_error = e
time.sleep(2 ** attempt)
else:
# all retries exhausted — dead-letter it
dead_letter.append({"item": item, "error": str(last_error)})dead_letter.append({
"item": item, # the original
"error": str(last_error), # what went wrong
"error_type": type(last_error).__name__,
"attempts": MAX_RETRIES + 1,
"timestamp": iso_now(),
})More context = faster triage. Future-you will thank present-you for the timestamp and error_type.
Depends on the failure mode:
Distinguish by exception class:
for attempt in range(MAX_RETRIES + 1):
try:
process(item)
break
except (ValueError, KeyError, ValidationError):
# permanent — don't retry
last_error = sys.exc_info()[1]
break
except (ConnectionError, TimeoutError):
# transient — retry
last_error = sys.exc_info()[1]
time.sleep(2 ** attempt)
else:
dead_letter.append(...)For production:
status='dead' — queryableFor today's lesson: an in-memory list. Same logical shape.
A DLQ that nobody reads is just a slower black hole. Set up a recurring check, even informal — "review the DLQ every Monday" goes a long way.
The alternative — letting failures kill the entire script — means a single bad item stops the entire run. With a DLQ, the script processes everything it can, dead-letters what it can't, and you triage the rest in your own time.
Yesterday you processed a queue of items, all succeeding. In production some items always fail — malformed input, account suspended, downstream API permanently 4xx-ing. Two bad answers:
The right answer is a dead-letter queue (DLQ): a separate list where unprocessable items go. The main queue keeps moving; you investigate the DLQ later.
from collections import deque
queue = deque([
{"id": "a", "value": 1},
{"id": "b", "value": -1}, # invalid — will fail
{"id": "c", "value": 3},
{"id": "d", "value": -2}, # invalid — will fail
{"id": "e", "value": 5},
])
processed = []
dead_letter = []
def process(item):
if item["value"] < 0:
raise ValueError(f"negative value: {item['value']}")
return item["value"] * 2
MAX_RETRIES = 2
for item in queue:
last_error = None
for attempt in range(MAX_RETRIES + 1):
try:
result = process(item)
processed.append({"id": item["id"], "result": result})
break
except Exception as e:
last_error = e
else:
dead_letter.append({"id": item["id"], "error": str(last_error)})
print(f"processed: {len(processed)}, dead-letter: {len(dead_letter)}")Expected: 3 processed (a, c, e), 2 dead-letter (b, d).
Why a for/else instead of an if?
for/else runs the else only when the loop didn't hit break. It's a clean Python idiom for "if no attempt succeeded, fall through here". Without it you'd track a succeeded boolean.
And what do you do with the dead-letter items later?
Read them. Manually if it's a dozen; with a separate consumer if it's many. The DLQ is triage, not a black hole — items there are waiting for a human to look at them. In production the DLQ is often a Sheet, a database table, or a real queue with longer retention.
Any non-trivial queue will eventually face poison messages — items that fail every retry. Without a DLQ:
Dead-lettering separates successfully processed from unprocessable, lets the queue keep flowing, preserves the bad items for later analysis.
for item in queue:
last_error = None
for attempt in range(MAX_RETRIES + 1):
try:
process(item)
break # success — exit retry loop
except Exception as e:
last_error = e
time.sleep(2 ** attempt)
else:
# all retries exhausted — dead-letter it
dead_letter.append({"item": item, "error": str(last_error)})dead_letter.append({
"item": item, # the original
"error": str(last_error), # what went wrong
"error_type": type(last_error).__name__,
"attempts": MAX_RETRIES + 1,
"timestamp": iso_now(),
})More context = faster triage. Future-you will thank present-you for the timestamp and error_type.
Depends on the failure mode:
Distinguish by exception class:
for attempt in range(MAX_RETRIES + 1):
try:
process(item)
break
except (ValueError, KeyError, ValidationError):
# permanent — don't retry
last_error = sys.exc_info()[1]
break
except (ConnectionError, TimeoutError):
# transient — retry
last_error = sys.exc_info()[1]
time.sleep(2 ** attempt)
else:
dead_letter.append(...)For production:
status='dead' — queryableFor today's lesson: an in-memory list. Same logical shape.
A DLQ that nobody reads is just a slower black hole. Set up a recurring check, even informal — "review the DLQ every Monday" goes a long way.
The alternative — letting failures kill the entire script — means a single bad item stops the entire run. With a DLQ, the script processes everything it can, dead-letters what it can't, and you triage the rest in your own time.
Create a free account to get started. Paid plans unlock all tracks.