Rate Limiting and Throttling in FastAPI

Rate limiting is how an API protects itself: bounding how many requests any one client can make in a window so a single heavy or misbehaving caller cannot exhaust shared resources or degrade the service for everyone else.

This topic is part of Async, Background Tasks and Observability. It is the edge defense in front of the async database pool and works naturally as middleware or a dependency.

A rate limiter backed by a shared store A client request reaches the rate limiter, which checks a shared counter in Redis. If the client is under its limit the request proceeds to the handler; if over, the limiter returns a 429 with a Retry-After header. Client Rate limiter token bucket Redis counter under → handler over → 429 Retry-After
The limiter consults a shared Redis counter so the limit holds across all workers, admitting requests under quota and rejecting the rest with a 429.

Core Mechanics: A Shared, Distributed Counter

Because an API spans many workers and machines, the limit must be enforced from a shared store. Redis holds one authoritative counter per client so the policy is global, not per process. A token-bucket check, run atomically, decides whether the request proceeds.

from fastapi import HTTPException, Request
from redis.asyncio import Redis


async def enforce_limit(request: Request, redis: Redis, limit: int, window: int) -> None:
    client_id = request.headers.get("x-api-key", request.client.host)
    key = f"rl:{client_id}"
    # Atomic increment + first-hit expiry gives a fixed-window counter across workers.
    count = await redis.incr(key)
    if count == 1:
        await redis.expire(key, window)
    if count > limit:
        ttl = await redis.ttl(key)
        raise HTTPException(429, "rate limit exceeded",
                            headers={"Retry-After": str(ttl)})

Production Implementation: Policies and Responses

Apply a broad default and stricter per-route or per-tier limits, and always return a well-formed 429 so clients can back off.

from typing import Annotated

from fastapi import APIRouter, Depends, Request


def rate_limit(limit: int, window: int):
    async def _dep(request: Request) -> None:
        await enforce_limit(request, request.app.state.redis, limit, window)
    return _dep


router = APIRouter()


# Expensive endpoint gets a tighter, per-route policy as a dependency.
@router.post("/exports", dependencies=[Depends(rate_limit(limit=5, window=60))])
async def create_export() -> dict[str, str]:
    return {"status": "queued"}

The library-based approach using SlowAPI and a token bucket is detailed in FastAPI Rate Limiting with Redis and SlowAPI.

Async and Performance Notes

The limiter runs on every request, so its store calls must be async and cheap — a single atomic Redis operation per check. Prefer a Lua script or a token-bucket library that performs the read-modify-write atomically, avoiding a race where two workers both read under the limit and both admit. Keep keys short-lived so memory stays bounded.

Testing Strategy

Assert that the limit triggers and that the response is well-formed:

def test_rate_limit_returns_429(client):
    for _ in range(5):
        assert client.post("/exports").status_code == 200
    blocked = client.post("/exports")
    assert blocked.status_code == 429
    assert "Retry-After" in blocked.headers      # Lets clients back off correctly.

Failure Modes and Debugging

  • Per-process counters. In-memory limits under-count across workers; use a shared store.
  • Race conditions. Non-atomic increment-and-check lets bursts slip through; make the check atomic.
  • Missing Retry-After. A bare 429 leaves clients guessing; always include the header.
  • Limiting by spoofable identity. Trusting an unauthenticated header lets clients evade limits; key on the authenticated principal where possible.