Rate Limiting and Throttling in FastAPI
Rate limiting is how an API protects itself: bounding how many requests any one client can make in a window so a single heavy or misbehaving caller cannot exhaust shared resources or degrade the service for everyone else.
This topic is part of Async, Background Tasks and Observability. It is the edge defense in front of the async database pool and works naturally as middleware or a dependency.
Core Mechanics: A Shared, Distributed Counter
Because an API spans many workers and machines, the limit must be enforced from a shared store. Redis holds one authoritative counter per client so the policy is global, not per process. A token-bucket check, run atomically, decides whether the request proceeds.
from fastapi import HTTPException, Request
from redis.asyncio import Redis
async def enforce_limit(request: Request, redis: Redis, limit: int, window: int) -> None:
client_id = request.headers.get("x-api-key", request.client.host)
key = f"rl:{client_id}"
# Atomic increment + first-hit expiry gives a fixed-window counter across workers.
count = await redis.incr(key)
if count == 1:
await redis.expire(key, window)
if count > limit:
ttl = await redis.ttl(key)
raise HTTPException(429, "rate limit exceeded",
headers={"Retry-After": str(ttl)})
Production Implementation: Policies and Responses
Apply a broad default and stricter per-route or per-tier limits, and always return a well-formed 429 so clients can back off.
from typing import Annotated
from fastapi import APIRouter, Depends, Request
def rate_limit(limit: int, window: int):
async def _dep(request: Request) -> None:
await enforce_limit(request, request.app.state.redis, limit, window)
return _dep
router = APIRouter()
# Expensive endpoint gets a tighter, per-route policy as a dependency.
@router.post("/exports", dependencies=[Depends(rate_limit(limit=5, window=60))])
async def create_export() -> dict[str, str]:
return {"status": "queued"}
The library-based approach using SlowAPI and a token bucket is detailed in FastAPI Rate Limiting with Redis and SlowAPI.
Async and Performance Notes
The limiter runs on every request, so its store calls must be async and cheap — a single atomic Redis operation per check. Prefer a Lua script or a token-bucket library that performs the read-modify-write atomically, avoiding a race where two workers both read under the limit and both admit. Keep keys short-lived so memory stays bounded.
Testing Strategy
Assert that the limit triggers and that the response is well-formed:
def test_rate_limit_returns_429(client):
for _ in range(5):
assert client.post("/exports").status_code == 200
blocked = client.post("/exports")
assert blocked.status_code == 429
assert "Retry-After" in blocked.headers # Lets clients back off correctly.
Failure Modes and Debugging
- Per-process counters. In-memory limits under-count across workers; use a shared store.
- Race conditions. Non-atomic increment-and-check lets bursts slip through; make the check atomic.
- Missing Retry-After. A bare 429 leaves clients guessing; always include the header.
- Limiting by spoofable identity. Trusting an unauthenticated header lets clients evade limits; key on the authenticated principal where possible.
Related Reading
- Up to the section: Async, Background Tasks and Observability.
- Hands-on guide: FastAPI Rate Limiting with Redis and SlowAPI.
- Composes with: Middleware Implementation and Async Database Sessions.