BigQuery Adaptive Repartitioning
How BigQuery Adaptive Repartitioning Works

? The Core Problem
During shuffle:
If key distribution is uneven:
Some partitions are huge
Some are tiny
Slowest partition determines stage runtime
This causes:
Memory pressure
Spill
Stragglers
Stage reattempts
? Adaptive Repartitioning (Conceptually)
BigQuery does dynamic repartitioning during execution when:
A partition grows too large
A worker becomes a straggler
Memory pressure crosses threshold
What Happens Internally
Runtime detects skew
Heavy partition is split into sub-partitions
Work is redistributed across additional slots
Slow stage rebalances
This is sometimes called dynamic fan-out.
? When It Triggers
Large GROUP BY cardinality
Hot join keys
Window functions on skewed keys
Large DISTINCT
? Tradeoffs
Adaptive repartitioning:
✅ Reduces worst-case skew
❌ Increases shuffle traffic
❌ Consumes more slots
❌ Increases slot-ms cost
So even when “fixed,” skew is still expensive.
? Important Insight
At PB scale:
Preventing skew is 10x cheaper than letting adaptive repartitioning fix it.
Because repartitioning multiplies network IO.
Comments (0)
No comments yet.
