Day 27 · ~12m●

Rate Limiting & Quotas

Implement per-user rate limits and token budgets to control API usage.

🧑‍💻

If my AI API costs money per call, how do I prevent one user from burning through my entire budget?

👩‍🏫

Rate limiting. You track how many requests each user makes in a time window and reject requests that exceed the limit:

from datetime import datetime, timedelta

class RateLimiter:
    def __init__(self, max_requests: int, window_seconds: int):
        self.max_requests = max_requests
        self.window = timedelta(seconds=window_seconds)
        self.requests = {}  # user -> list of timestamps
    
    def is_allowed(self, user: str) -> bool:
        now = datetime.utcnow()
        cutoff = now - self.window
        # Clean old requests
        if user in self.requests:
            self.requests[user] = [
                t for t in self.requests[user] if t > cutoff
            ]
        else:
            self.requests[user] = []
        # Check limit
        if len(self.requests[user]) >= self.max_requests:
            return False
        self.requests[user].append(now)
        return True

This is a sliding window rate limiter. It keeps timestamps of recent requests and counts how many fall within the window.

🧑‍💻

How does this integrate with FastAPI?

👩‍🏫

As a dependency, naturally:

limiter = RateLimiter(max_requests=10, window_seconds=60)

def check_rate_limit(user: str = Depends(require_auth)):
    if not limiter.is_allowed(user):
        raise HTTPException(
            status_code=429,
            detail="Rate limit exceeded. Try again later."
        )
    return user

@app.post("/chat")
def chat(user: str = Depends(check_rate_limit)):
    return {"message": "Response here"}

HTTP 429 means "Too Many Requests" — the standard status for rate limiting.

🧑‍💻

What about token budgets? LLM costs depend on how many tokens you process, not just how many requests.

👩‍🏫

Track tokens per user in addition to request counts:

class TokenBudget:
    def __init__(self, daily_limit: int):
        self.daily_limit = daily_limit
        self.usage = {}  # user -> {"date": str, "tokens": int}
    
    def check_and_deduct(self, user: str, tokens: int) -> bool:
        today = datetime.utcnow().strftime("%Y-%m-%d")
        if user not in self.usage or self.usage[user]["date"] != today:
            self.usage[user] = {"date": today, "tokens": 0}
        if self.usage[user]["tokens"] + tokens > self.daily_limit:
            return False
        self.usage[user]["tokens"] += tokens
        return True

🧑‍💻

Should I rate limit by IP address or by user identity?

👩‍🏫

Both. IP-based limits protect against unauthenticated abuse — someone hammering your login endpoint. User-based limits protect against authenticated abuse — one user consuming all your API budget.

Layer them:

IP rate limit on all endpoints (100 requests/minute)
User rate limit on authenticated endpoints (20 requests/minute)
Token budget on AI endpoints (10,000 tokens/day)

Each layer catches different kinds of abuse. The tighter limits go on the most expensive operations.

Practice your skills

Already have an account? Sign in