Handling Deeply Nested JSON Models Efficiently
Key takeaways:
- Serialize with
model_dump_json()in a single pass instead of dumping to a dict then re-encoding. - Return explicit response models so only needed fields are serialized.
- Paginate large nested collections rather than embedding them whole.
- Use
model_constructonly for trusted internal data to skip re-validation. - Profile a representative payload before optimizing so you target the real cost.
This guide makes the cost model in Nested Model Serialization concrete. Read that page for how composition and model_dump work.
The Problem This Solves
A nested response feels free until traffic grows. Serialization walks every node, so a response with thousands of nested items spends real CPU on the event loop and allocates large intermediate structures. Left unchecked, this shows up as latency on a hot endpoint and as memory pressure under load.
Prerequisites
- Pydantic v2 (the Rust core is what makes single-pass serialization fast).
- A representative payload to measure against.
Step-by-Step Implementation
1. Serialize directly to JSON
# Efficient: one pass to a JSON string on the Rust core.
body = order.model_dump_json()
# Wasteful on large graphs: builds a dict, then re-encodes it in Python.
# body = json.dumps(order.model_dump())
2. Trim with an explicit response model
from pydantic import BaseModel
class OrderSummary(BaseModel):
id: int
total_quantity: int # Only what the list view needs — not full line items.
@router.get("/orders", response_model=list[OrderSummary])
async def list_orders() -> list[OrderSummary]:
return await fetch_order_summaries()
3. Paginate nested collections
@router.get("/orders/{order_id}/items")
async def list_items(order_id: int, limit: int = 50, offset: int = 0):
# Page the children instead of embedding all of them in the order response.
return await fetch_items(order_id, limit=limit, offset=offset)
4. Skip re-validation for trusted data
# Data your own pipeline already validated — construct without re-running validators.
trusted = OrderInternal.model_construct(**already_validated_row)
Edge Cases and Gotchas
- model_construct and defaults. It does not apply default values or run validators, so supply every field explicitly or you will get partially-initialized models.
- Computed fields on hot paths. A
computed_fieldruns on every serialization; if it is expensive, cache the underlying value. - By-alias cost. Serializing
by_alias=Trueis cheap, but mixing alias conventions across nested models causes confusing output; standardize.
Verification
Measure serialization time on a realistic payload so the optimization is evidence-based:
import time
def test_serialization_under_budget(big_order):
start = time.perf_counter()
big_order.model_dump_json()
elapsed_ms = (time.perf_counter() - start) * 1000
assert elapsed_ms < 25 # Guardrail; tune to your payload and SLA.
Related Reading
- Up to the topic: Nested Model Serialization.
- Related patterns: Performance Optimization for Models and Caching Strategies for caching hot serialized responses.