BigQuery Query Execution
How BigQuery Query Execution Stages Use Slots Internally?

BigQuery is built on the Google BigQuery distributed execution engine (Dremel-based architecture).
Every query is broken into stages, and each stage runs in parallel using slots.
? High-Level Execution Flow
A query like:
FROM events
WHERE event_date = '2026-01-01'
GROUP BY user_id
Becomes:
Scan stage
Filter stage
Shuffle stage
Aggregation stage
Final combine stage
Each stage consumes slots differently.
? Stage Types & Slot Usage
1️⃣ Scan Stage
Reads columnar storage
Highly parallel
Each slot scans a shard of data
If 1TB scanned:
1 slot might scan ~300MB/sec
More slots = faster scan
2️⃣ Shuffle Stage (Most Expensive)
Triggered by:
GROUP BYJOINORDER BYDISTINCT
Data is redistributed across workers based on hash keys.
⚠ This is where most slot pressure happens.
If you group by user_id:
Data is hash-distributed
All rows for same user must go to same worker
Slots are heavily used for memory + network IO
3️⃣ Aggregation Stage
Each slot aggregates its partition
Then partial results merge
Low-cardinality → cheap
High-cardinality → expensive
? Parallelism Model
Each stage:
Uses many slots in parallel
Releases them after completion
Next stage requests slots
Slots are allocated per stage dynamically.
Important:
BigQuery does NOT assign fixed slots to a query permanently — they are allocated stage-by-stage.
? Slot Timeline Example
Query runtime = 40 sec
| Stage | Duration | Slots Used |
|---|---|---|
| Scan | 10s | 500 |
| Shuffle | 20s | 1000 |
| Aggregate | 8s | 400 |
| Final | 2s | 50 |
Peak slot usage = 1000
You are billed based on slot-seconds.
Comments (0)
No comments yet.
