Middleware Implementation in FastAPI

Middleware is code that wraps the entire application, running before a request reaches routing and after the response is produced, applying cross-cutting concerns — correlation IDs, CORS, timing, body-size limits — uniformly to every request.

Middleware is the outermost layer of the architecture in Core Architecture and Routing Patterns: it sees every request before the router does. It supplies the correlation ID that error handling and observability depend on, and it is deliberately distinct from dependency injection, which handles per-route concerns.

The nested middleware chain wrapping the route handler Concentric layers show a request entering through tracing, then CORS, then authentication gate middleware before reaching the route handler at the center, and the response travelling back out through the same layers in reverse. Tracing / timing (outermost) CORS Body-size / security gate Route handler request in → layers wrap → handler → layers unwrap → response out
Each middleware nests inside the previous one. The request travels inward to the handler, and the response travels back outward through the same layers in reverse order.

Core Mechanics: The Ordered Chain

Middleware forms nested layers. Each piece runs some logic, calls the next layer (call_next), and then runs more logic on the response. The order you add middleware determines the nesting: the last one added is the outermost, so it runs first on the way in and last on the way out. This ordering is semantically important — a timing middleware must be outermost to measure everything inside it.

import time
import uuid

from fastapi import Request
from starlette.middleware.base import BaseHTTPMiddleware


class RequestContextMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next):
        request_id = request.headers.get("x-request-id", str(uuid.uuid4()))
        request.state.request_id = request_id   # Make it available to handlers.
        start = time.perf_counter()
        response = await call_next(request)      # Hand off to inner layers + route.
        response.headers["x-request-id"] = request_id
        response.headers["x-response-time-ms"] = f"{(time.perf_counter() - start) * 1000:.1f}"
        return response

Production Implementation: Pure ASGI for the Hot Path

BaseHTTPMiddleware is convenient but wraps each request in an abstraction that can buffer the body and add overhead. For high-throughput or streaming endpoints, a pure ASGI middleware operating on the raw scope, receive, and send is leaner because it never materializes the response body.

class TimingASGIMiddleware:
    def __init__(self, app) -> None:
        self.app = app

    async def __call__(self, scope, receive, send) -> None:
        if scope["type"] != "http":      # Pass through lifespan and websocket scopes.
            await self.app(scope, receive, send)
            return
        start = time.perf_counter()

        async def send_wrapper(message) -> None:
            if message["type"] == "http.response.start":
                elapsed = f"{(time.perf_counter() - start) * 1000:.1f}".encode()
                message["headers"].append((b"x-response-time-ms", elapsed))
            await send(message)

        await self.app(scope, receive, send_wrapper)

The request-tracing build-out, including propagating the ID into logs, is detailed in Implementing Custom Middleware for Request Tracing.

Async and Performance Notes

Middleware runs on every request, so its cost is multiplied across your entire traffic. Avoid awaiting additional I/O in middleware unless it is essential; a per-request database write or remote call in middleware adds its latency to every endpoint. Buffering response bodies in BaseHTTPMiddleware also defeats streaming responses, so reach for pure ASGI when you stream.

Testing Strategy

Assert the cross-cutting effect on an arbitrary route, since middleware applies everywhere:

def test_request_id_header_present(client):
    resp = client.get("/health")
    assert "x-request-id" in resp.headers          # Set uniformly by middleware.
    assert "x-response-time-ms" in resp.headers

Failure Modes and Debugging

  • Wrong ordering. A tracing middleware added before others will not wrap them; add cross-cutting observers last so they sit outermost.
  • Body consumed twice. Reading the request body in middleware can starve the handler. Keep body inspection out of middleware.
  • Blocking work per request. Synchronous calls in middleware stall the loop for every request; offload or remove them.
  • Swallowed exceptions. Middleware that catches exceptions can bypass your global exception handlers; let exceptions propagate so handlers build the envelope.