Google BigQuery Shuffling

How Shuffle Memory Pressure Causes Stage Reattempts

AdminFollow

5 min•Feb 28, 2026

Views - 16

Shuffle is the most expensive operation in BigQuery.

It happens during:

JOIN
GROUP BY
ORDER BY
DISTINCT
Window functions

? What Actually Happens Internally

During shuffle:

Data is hash-partitioned by key.
Each slot is responsible for a hash bucket.
All rows for a key must fit into memory of that slot.

Each slot has:

Fixed memory
Fixed CPU
Spill-to-disk capability (limited)

? Memory Pressure Scenario

Imagine:

SELECT user_id, COUNT(*)
FROM events
GROUP BY user_id

If:

1 user_id has 50% of all rows
That key hashes to one slot

That slot receives:

Huge data volume
Memory overload

Result:

Slot spills to disk
Performance degrades
If spill exceeds limits → stage fails
BigQuery reattempts the stage

? What Is a Stage Reattempt?

BigQuery:

Detects worker failure or OOM
Restarts the stage
May repartition data differently
Consumes more slots

This increases:

Slot-ms
Runtime
Cost

? Symptoms in Execution Plan

You’ll see:

Stage retry count > 1
Large shuffle bytes
High spill to disk
One worker significantly slower

? How to Prevent Shuffle Memory Pressure

✅ 1. Reduce cardinality before shuffle

Bad:

SELECT user_id, COUNT(DISTINCT session_id)
FROM huge_table
GROUP BY user_id

Better:

Pre-aggregate sessions first
Then group

✅ 2. Break large queries into steps

Instead of 1 giant query:

Create intermediate table
Reduce row width
Reduce row count

✅ 3. Avoid extreme skew keys

Comments (0)

No comments yet.

Google BigQuery Shuffling

How Shuffle Memory Pressure Causes Stage Reattempts

? What Actually Happens Internally

? Memory Pressure Scenario

? What Is a Stage Reattempt?

? Symptoms in Execution Plan

? How to Prevent Shuffle Memory Pressure

✅ 1. Reduce cardinality before shuffle

✅ 2. Break large queries into steps

✅ 3. Avoid extreme skew keys

Comments (0)

Course

Service

Get In Touch

Technical Skills

Analytical Skills

Business Skills

Career Resources