Day 27 · ~12m

Rate Limiting & Quotas

Implement per-user rate limits and token budgets to control API usage.

🧑‍💻

If my AI API costs money per call, how do I prevent one user from burning through my entire budget?

👩‍🏫

Rate limiting. You track how many requests each user makes in a time window and reject requests that exceed the limit:

from datetime import datetime, timedelta

class RateLimiter:
    def __init__(self, max_requests: int, window_seconds: int):
        self.max_requests = max_requests
        self.window = timedelta(seconds=window_seconds)
        self.requests = {}  # user -> list of timestamps
    
    def is_allowed(self, user: str) -> bool:
        now = datetime.utcnow()
        cutoff = now - self.window
        # Clean old requests
        if user in self.requests:
            self.requests[user] = [
                t for t in self.requests[user] if t > cutoff
            ]
        else:
            self.requests[user] = []
        # Check limit
        if len(self.requests[user]) >= self.max_requests:
            return False
        self.requests[user].append(now)
        return True

This is a sliding window rate limiter. It keeps timestamps of recent requests and counts how many fall within the window.

🧑‍💻

How does this integrate with FastAPI?

👩‍🏫

As a dependency, naturally:

limiter = RateLimiter(max_requests=10, window_seconds=60)

def check_rate_limit(user: str = Depends(require_auth)):
    if not limiter.is_allowed(user):
        raise HTTPException(
            status_code=429,
            detail="Rate limit exceeded. Try again later."
        )
    return user

@app.post("/chat")
def chat(user: str = Depends(check_rate_limit)):
    return {"message": "Response here"}

HTTP 429 means "Too Many Requests" — the standard status for rate limiting.

🧑‍💻

What about token budgets? LLM costs depend on how many tokens you process, not just how many requests.

👩‍🏫

Track tokens per user in addition to request counts:

class TokenBudget:
    def __init__(self, daily_limit: int):
        self.daily_limit = daily_limit
        self.usage = {}  # user -> {"date": str, "tokens": int}
    
    def check_and_deduct(self, user: str, tokens: int) -> bool:
        today = datetime.utcnow().strftime("%Y-%m-%d")
        if user not in self.usage or self.usage[user]["date"] != today:
            self.usage[user] = {"date": today, "tokens": 0}
        if self.usage[user]["tokens"] + tokens > self.daily_limit:
            return False
        self.usage[user]["tokens"] += tokens
        return True
🧑‍💻

Should I rate limit by IP address or by user identity?

👩‍🏫

Both. IP-based limits protect against unauthenticated abuse — someone hammering your login endpoint. User-based limits protect against authenticated abuse — one user consuming all your API budget.

Layer them:

  1. IP rate limit on all endpoints (100 requests/minute)
  2. User rate limit on authenticated endpoints (20 requests/minute)
  3. Token budget on AI endpoints (10,000 tokens/day)

Each layer catches different kinds of abuse. The tighter limits go on the most expensive operations.

Practice your skills

Sign up to write and run code in this lesson.

Already have an account? Sign in