When the primary model fails — outage, rate limit, parse error — you don't crash. You fall back. The pattern is a list of strategies tried in order:
def answer_with_fallback(query):
for strategy in [primary, secondary, from_cache, refuse]:
try:
result = strategy(query)
if result is not None:
return result
except Exception:
continue
raise RuntimeError("all fallback tiers exhausted")4 tiers: primary model → secondary model → cached response → refusal message. Each tier knows how to fail (returns None or raises) so the next can take over.
And the order matters?
From most-preferred to least-preferred. Primary is best (cheapest, freshest). Secondary is good (different provider, expensive). Cache is fast but possibly stale. Refusal is last — better than a crash.
What's the trigger for falling back?
Anything that signals "this tier failed" — a raised exception, a None return, a low-quality output (e.g., parsing failure). The chain is defensive: each tier reports failure, the next takes over.
def try_chain(query, strategies):
"""Try each strategy in order. Return first successful (non-None, non-raising) result."""
last_err = None
for strategy in strategies:
try:
result = strategy(query)
if result is not None:
return strategy.__name__, result
except Exception as e:
last_err = e
continue
raise RuntimeError(f"all strategies failed; last error: {last_err}")| Tier | Speed | Cost | Quality | Failure mode |
|---|---|---|---|---|
| Primary model | Slow | High | Best | Outage, rate limit |
| Secondary model | Slow | High | Good | Outage |
| Cached response | Instant | Free | Stale | Cache miss |
| Static refusal | Instant | Free | Generic | None — always works |
A real chain has 3-4 tiers. More than that, you're papering over a deeper architectural issue.
None, the chain advancesRetry is within a tier (retry the same strategy a few times before giving up). Fallback is between tiers (try a different strategy entirely). They compose:
def primary(query):
return with_retry(lambda: call_primary_model(query), max_attempts=3)primary retries 3 times. If all 3 fail with rate-limit errors, the exception bubbles, and the fallback chain advances to secondary.
KeyError, TypeError aren't transient. Don't catch broadly.When the primary model fails — outage, rate limit, parse error — you don't crash. You fall back. The pattern is a list of strategies tried in order:
def answer_with_fallback(query):
for strategy in [primary, secondary, from_cache, refuse]:
try:
result = strategy(query)
if result is not None:
return result
except Exception:
continue
raise RuntimeError("all fallback tiers exhausted")4 tiers: primary model → secondary model → cached response → refusal message. Each tier knows how to fail (returns None or raises) so the next can take over.
And the order matters?
From most-preferred to least-preferred. Primary is best (cheapest, freshest). Secondary is good (different provider, expensive). Cache is fast but possibly stale. Refusal is last — better than a crash.
What's the trigger for falling back?
Anything that signals "this tier failed" — a raised exception, a None return, a low-quality output (e.g., parsing failure). The chain is defensive: each tier reports failure, the next takes over.
def try_chain(query, strategies):
"""Try each strategy in order. Return first successful (non-None, non-raising) result."""
last_err = None
for strategy in strategies:
try:
result = strategy(query)
if result is not None:
return strategy.__name__, result
except Exception as e:
last_err = e
continue
raise RuntimeError(f"all strategies failed; last error: {last_err}")| Tier | Speed | Cost | Quality | Failure mode |
|---|---|---|---|---|
| Primary model | Slow | High | Best | Outage, rate limit |
| Secondary model | Slow | High | Good | Outage |
| Cached response | Instant | Free | Stale | Cache miss |
| Static refusal | Instant | Free | Generic | None — always works |
A real chain has 3-4 tiers. More than that, you're papering over a deeper architectural issue.
None, the chain advancesRetry is within a tier (retry the same strategy a few times before giving up). Fallback is between tiers (try a different strategy entirely). They compose:
def primary(query):
return with_retry(lambda: call_primary_model(query), max_attempts=3)primary retries 3 times. If all 3 fail with rate-limit errors, the exception bubbles, and the fallback chain advances to secondary.
KeyError, TypeError aren't transient. Don't catch broadly.Create a free account to get started. Paid plans unlock all tracks.