Scheduled at 9am, fails at 9:00:03, no human watching. By the time you notice — tomorrow morning when the report you expected isn't there — 24 hours of nothing. The fix: the script alerts you when it crashes.
Wrap the chain in try/except. On unrecoverable failure, send yourself an email with the error class:
import os
recipient = os.environ.get("USER_EMAIL", "you@example.com")
try:
# ... the chain ...
raise RuntimeError("forced for demo") # simulate failure
except Exception as e:
toolset.execute_action(Action.GMAIL_SEND_EMAIL, {
"recipient_email": recipient,
"subject": f"[zuzu-day-18] script failed: {type(e).__name__}",
"body": f"error: {e}\nclass: {type(e).__name__}",
})
raise # re-raise so the run is marked failedThe send happens before re-raising — alert lands in your inbox, then the original exception propagates. Future-you reads the email at 9:01am and starts debugging.
Why send to yourself and not Slack?
Email and Slack both work. We use Gmail self-send here because (a) Gmail is in your stack already, (b) email is universally accessible — Slack workspaces come and go, your email doesn't, (c) the lesson stays in 7 supported tools.
In a real workflow you might use whatever notification channel you live in. The pattern is identical — send a structured failure message to a channel you watch.
And the raise after sending — why both?
The send is additional; the original error must still terminate the run. Without raise, the script ends with status "success" because we caught everything. The scheduler thinks the run worked. With raise, the original exception propagates after the alert is sent — the scheduler logs the failure, retries appropriately, and you have a notification plus a real failure status.
try:
do_the_chain()
except Exception as e:
notify_failure(e)
raiseThree lines around the chain. The notify_failure is a tool call (Gmail self-send, Calendar event create, Tasks create — pick your channel). The trailing raise re-throws the original error.
Keep it parseable, even from your phone at 3am:
[script-name] failed: ClassName — at-a-glance triagebody = f"""
script: {SCRIPT_NAME}
run_at: {datetime.utcnow().isoformat()}Z
error_class: {type(e).__name__}
error_message: {e}
last_step: {last_step_seen}
""".strip()Three lines of context tell you: which script failed, when, what kind of error, where in the chain it happened. Enough to know whether to fix tonight or tomorrow.
The alert is for unrecoverable errors — the ones that survived your retry loop. Transient errors that retry-and-succeed shouldn't ping you; they're noise.
try:
with_retry(do_the_chain) # retries internally on transient
except Exception as e:
# only fires when retries exhausted
notify_failure(e)
raiseThe retry loop is the gate. Alert only on what it lets through.
If notify_failure itself fails (Gmail down, auth expired), don't infinite-loop:
def notify_failure(e):
try:
send_email(subject=..., body=...)
except Exception:
# alert system is down — log and give up
print(f"COULDNT_ALERT: {type(e).__name__}: {e}")Log to stdout as a last-resort breadcrumb. The scheduler's run log still captures it.
A partial-failure loop processes 100 items, 7 fail. Don't send 7 emails — send one with the count and the failed IDs.
if results["fail"] > 0:
notify(f"{results['fail']} of {len(items)} failed: {failed_ids}")One actionable notification beats 7 panicky ones.
Scheduled at 9am, fails at 9:00:03, no human watching. By the time you notice — tomorrow morning when the report you expected isn't there — 24 hours of nothing. The fix: the script alerts you when it crashes.
Wrap the chain in try/except. On unrecoverable failure, send yourself an email with the error class:
import os
recipient = os.environ.get("USER_EMAIL", "you@example.com")
try:
# ... the chain ...
raise RuntimeError("forced for demo") # simulate failure
except Exception as e:
toolset.execute_action(Action.GMAIL_SEND_EMAIL, {
"recipient_email": recipient,
"subject": f"[zuzu-day-18] script failed: {type(e).__name__}",
"body": f"error: {e}\nclass: {type(e).__name__}",
})
raise # re-raise so the run is marked failedThe send happens before re-raising — alert lands in your inbox, then the original exception propagates. Future-you reads the email at 9:01am and starts debugging.
Why send to yourself and not Slack?
Email and Slack both work. We use Gmail self-send here because (a) Gmail is in your stack already, (b) email is universally accessible — Slack workspaces come and go, your email doesn't, (c) the lesson stays in 7 supported tools.
In a real workflow you might use whatever notification channel you live in. The pattern is identical — send a structured failure message to a channel you watch.
And the raise after sending — why both?
The send is additional; the original error must still terminate the run. Without raise, the script ends with status "success" because we caught everything. The scheduler thinks the run worked. With raise, the original exception propagates after the alert is sent — the scheduler logs the failure, retries appropriately, and you have a notification plus a real failure status.
try:
do_the_chain()
except Exception as e:
notify_failure(e)
raiseThree lines around the chain. The notify_failure is a tool call (Gmail self-send, Calendar event create, Tasks create — pick your channel). The trailing raise re-throws the original error.
Keep it parseable, even from your phone at 3am:
[script-name] failed: ClassName — at-a-glance triagebody = f"""
script: {SCRIPT_NAME}
run_at: {datetime.utcnow().isoformat()}Z
error_class: {type(e).__name__}
error_message: {e}
last_step: {last_step_seen}
""".strip()Three lines of context tell you: which script failed, when, what kind of error, where in the chain it happened. Enough to know whether to fix tonight or tomorrow.
The alert is for unrecoverable errors — the ones that survived your retry loop. Transient errors that retry-and-succeed shouldn't ping you; they're noise.
try:
with_retry(do_the_chain) # retries internally on transient
except Exception as e:
# only fires when retries exhausted
notify_failure(e)
raiseThe retry loop is the gate. Alert only on what it lets through.
If notify_failure itself fails (Gmail down, auth expired), don't infinite-loop:
def notify_failure(e):
try:
send_email(subject=..., body=...)
except Exception:
# alert system is down — log and give up
print(f"COULDNT_ALERT: {type(e).__name__}: {e}")Log to stdout as a last-resort breadcrumb. The scheduler's run log still captures it.
A partial-failure loop processes 100 items, 7 fail. Don't send 7 emails — send one with the count and the failed IDs.
if results["fail"] > 0:
notify(f"{results['fail']} of {len(items)} failed: {failed_ids}")One actionable notification beats 7 panicky ones.
Create a free account to get started. Paid plans unlock all tracks.