Most well-behaved APIs send your remaining quota in the response headers. You don't have to guess — just read.
A fixture response (provided as a dict for clarity, since the engine doesn't hit live endpoints with rate-limit semantics):
response_headers = {
"X-RateLimit-Limit": "100",
"X-RateLimit-Remaining": "73",
"X-RateLimit-Reset": "1730000000",
"Content-Type": "application/json",
}
remaining = int(response_headers.get("X-RateLimit-Remaining", "0"))
print(f"remaining: {remaining}")Are these header names standard?
Common but not universal. X-RateLimit-* is widely used. GitHub uses X-RateLimit-Remaining. Stripe uses... nothing in headers — they prefer body fields. Twitter/X uses x-rate-limit-remaining. Some APIs send only a Retry-After header when they 429.
The pattern is the same regardless of the exact header name: check the API's docs once, then read the documented header in your code.
Why parse from a fixture instead of a live endpoint?
Because lessons need deterministic behavior. A live endpoint's rate-limit count varies based on prior usage and shared infrastructure. The pedagogy is reading the headers — that part is the same whether you read them from a real requests.Response or a dict.
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Total allowed in the window |
X-RateLimit-Remaining | Calls left before hitting the limit |
X-RateLimit-Reset | Unix timestamp when the window resets |
Retry-After | Seconds to wait (only when 429'd, sometimes other 4xx/5xx) |
Not every API sends all four. Some send a subset; some send their own variants. Pattern: read the API's docs once, write a small helper to extract the headers, reuse.
requests.Responseimport requests
r = requests.get(url, timeout=10)
remaining = int(r.headers.get("X-RateLimit-Remaining", 0))
reset_ts = int(r.headers.get("X-RateLimit-Reset", 0))r.headers is a case-insensitive dict-like — r.headers["X-RATELIMIT-REMAINING"] works the same as r.headers["x-ratelimit-remaining"]. Don't worry about case.
Headers are strings. "73" - 1 is a TypeError. Always cast:
remaining = int(r.headers.get("X-RateLimit-Remaining", "0"))Three natural reactions, in increasing sophistication:
Logging is the cheap baseline — even one print per call lets you spot when you're approaching limits.
A single API often has multiple rate-limit dimensions:
The headers usually describe the most-restrictive limit you're hitting. If you're seeing X-RateLimit-Remaining: 0, it's whichever bucket you exhausted first.
Fall back to: respect 429s with Retry-After, and rate-limit yourself client-side (e.g., time.sleep(0.5) between calls = max 2 req/sec). Less elegant but works for any API.
Most well-behaved APIs send your remaining quota in the response headers. You don't have to guess — just read.
A fixture response (provided as a dict for clarity, since the engine doesn't hit live endpoints with rate-limit semantics):
response_headers = {
"X-RateLimit-Limit": "100",
"X-RateLimit-Remaining": "73",
"X-RateLimit-Reset": "1730000000",
"Content-Type": "application/json",
}
remaining = int(response_headers.get("X-RateLimit-Remaining", "0"))
print(f"remaining: {remaining}")Are these header names standard?
Common but not universal. X-RateLimit-* is widely used. GitHub uses X-RateLimit-Remaining. Stripe uses... nothing in headers — they prefer body fields. Twitter/X uses x-rate-limit-remaining. Some APIs send only a Retry-After header when they 429.
The pattern is the same regardless of the exact header name: check the API's docs once, then read the documented header in your code.
Why parse from a fixture instead of a live endpoint?
Because lessons need deterministic behavior. A live endpoint's rate-limit count varies based on prior usage and shared infrastructure. The pedagogy is reading the headers — that part is the same whether you read them from a real requests.Response or a dict.
| Header | Meaning |
|---|---|
X-RateLimit-Limit | Total allowed in the window |
X-RateLimit-Remaining | Calls left before hitting the limit |
X-RateLimit-Reset | Unix timestamp when the window resets |
Retry-After | Seconds to wait (only when 429'd, sometimes other 4xx/5xx) |
Not every API sends all four. Some send a subset; some send their own variants. Pattern: read the API's docs once, write a small helper to extract the headers, reuse.
requests.Responseimport requests
r = requests.get(url, timeout=10)
remaining = int(r.headers.get("X-RateLimit-Remaining", 0))
reset_ts = int(r.headers.get("X-RateLimit-Reset", 0))r.headers is a case-insensitive dict-like — r.headers["X-RATELIMIT-REMAINING"] works the same as r.headers["x-ratelimit-remaining"]. Don't worry about case.
Headers are strings. "73" - 1 is a TypeError. Always cast:
remaining = int(r.headers.get("X-RateLimit-Remaining", "0"))Three natural reactions, in increasing sophistication:
Logging is the cheap baseline — even one print per call lets you spot when you're approaching limits.
A single API often has multiple rate-limit dimensions:
The headers usually describe the most-restrictive limit you're hitting. If you're seeing X-RateLimit-Remaining: 0, it's whichever bucket you exhausted first.
Fall back to: respect 429s with Retry-After, and rate-limit yourself client-side (e.g., time.sleep(0.5) between calls = max 2 req/sec). Less elegant but works for any API.
Create a free account to get started. Paid plans unlock all tracks.