Generic retry-on-anything wastes attempts on errors that won't resolve. Rate-limit-specific retry catches just 429-shaped errors and backs off; lets everything else fail fast.
import time
class RateLimitError(Exception):
pass
def call_with_retry(call, max_attempts=3, base_delay=1.0):
for attempt in range(max_attempts):
try:
return call()
except RateLimitError:
if attempt == max_attempts - 1:
raise
delay = base_delay * (2 ** attempt) # 1s, 2s, 4s
print(f"rate limited; retry in {delay}s")
time.sleep(delay)Exponential backoff: 1s → 2s → 4s. Each retry waits longer to give the rate-limit window a chance to refresh.
And other errors?
Re-raise immediately. A 401 (auth) won't fix itself by waiting. A 500 might — you'd write separate retry policy for transient server errors. Different errors, different policies.
What if the API returns a Retry-After header?
Use it. If the server tells you when it'll be ready, that beats your guess. Catching the error and reading error.retry_after (when available) is the gold standard.
import time
class RateLimitError(Exception):
def __init__(self, retry_after=None):
self.retry_after = retry_after
super().__init__("rate limited")
def call_with_retry(fn, max_attempts=3, base_delay=1.0):
last_err = None
for attempt in range(max_attempts):
try:
return fn()
except RateLimitError as e:
last_err = e
if attempt == max_attempts - 1:
break
delay = e.retry_after or base_delay * (2 ** attempt)
time.sleep(delay)
raise last_errThe except RateLimitError clause. Other exceptions propagate immediately — your script crashes loud on bugs (KeyError, ValueError) and on non-retryable API errors (AuthenticationError, InvalidRequestError).
| Strategy | When |
|---|---|
| Constant (always 1s) | Simple, but doesn't adapt to load |
| Linear (1s, 2s, 3s) | Mild adaptation |
| Exponential (1s, 2s, 4s) | Standard — forgiving on transient bursts, gives up before forever |
| Exponential + jitter (1±0.5s, 2±1s, ...) | Production — prevents thundering herd from sync'd retries |
for attempt in range(MAX_ATTEMPTS):Without MAX_ATTEMPTS, a permanently-rate-limited script retries forever. Pick 3-5 for foreground calls, 10+ for background batch jobs.
From AI Patterns: retry on bad content. Today: retry on bad transport. They're separate policies and they nest:
def get_classification(text):
@retry_on_rate_limit(max=3)
def network_call():
return Agent(model).run_sync(...).output
return retry_on_bad_output(network_call, validate=...) # different policyNetwork-layer retry handles 429s. Output-layer retry handles malformed JSON. Either can fail; both have their own caps.
Generic retry-on-anything wastes attempts on errors that won't resolve. Rate-limit-specific retry catches just 429-shaped errors and backs off; lets everything else fail fast.
import time
class RateLimitError(Exception):
pass
def call_with_retry(call, max_attempts=3, base_delay=1.0):
for attempt in range(max_attempts):
try:
return call()
except RateLimitError:
if attempt == max_attempts - 1:
raise
delay = base_delay * (2 ** attempt) # 1s, 2s, 4s
print(f"rate limited; retry in {delay}s")
time.sleep(delay)Exponential backoff: 1s → 2s → 4s. Each retry waits longer to give the rate-limit window a chance to refresh.
And other errors?
Re-raise immediately. A 401 (auth) won't fix itself by waiting. A 500 might — you'd write separate retry policy for transient server errors. Different errors, different policies.
What if the API returns a Retry-After header?
Use it. If the server tells you when it'll be ready, that beats your guess. Catching the error and reading error.retry_after (when available) is the gold standard.
import time
class RateLimitError(Exception):
def __init__(self, retry_after=None):
self.retry_after = retry_after
super().__init__("rate limited")
def call_with_retry(fn, max_attempts=3, base_delay=1.0):
last_err = None
for attempt in range(max_attempts):
try:
return fn()
except RateLimitError as e:
last_err = e
if attempt == max_attempts - 1:
break
delay = e.retry_after or base_delay * (2 ** attempt)
time.sleep(delay)
raise last_errThe except RateLimitError clause. Other exceptions propagate immediately — your script crashes loud on bugs (KeyError, ValueError) and on non-retryable API errors (AuthenticationError, InvalidRequestError).
| Strategy | When |
|---|---|
| Constant (always 1s) | Simple, but doesn't adapt to load |
| Linear (1s, 2s, 3s) | Mild adaptation |
| Exponential (1s, 2s, 4s) | Standard — forgiving on transient bursts, gives up before forever |
| Exponential + jitter (1±0.5s, 2±1s, ...) | Production — prevents thundering herd from sync'd retries |
for attempt in range(MAX_ATTEMPTS):Without MAX_ATTEMPTS, a permanently-rate-limited script retries forever. Pick 3-5 for foreground calls, 10+ for background batch jobs.
From AI Patterns: retry on bad content. Today: retry on bad transport. They're separate policies and they nest:
def get_classification(text):
@retry_on_rate_limit(max=3)
def network_call():
return Agent(model).run_sync(...).output
return retry_on_bad_output(network_call, validate=...) # different policyNetwork-layer retry handles 429s. Output-layer retry handles malformed JSON. Either can fail; both have their own caps.
Create a free account to get started. Paid plans unlock all tracks.