Google BQ Join
How to Design Join Keys to Avoid Skew
AdminFollow
5 min•Feb 28, 2026
Views - 14

? What Is Join Skew?
Example:
JOIN users u
ON e.country = u.country
ON e.country = u.country
If:
60% of rows = "US"
All US rows hash to one partition
That slot:
Becomes hotspot
Runs 10x slower than others
Entire stage waits for slowest slot.
? Why Skew Is Worse at Scale
At 100TB scale:
Even 5% skew is huge
Shuffle network IO explodes
Memory pressure increases
? Anti-Skew Design Strategies
✅ 1. Join on high-cardinality keys
Good:
ON e.user_id = u.user_id
Bad:
ON e.country = u.country
✅ 2. Salting Technique
Add artificial randomness:
CONCAT(user_id, CAST(FARM_FINGERPRINT(event_id) % 10 AS STRING))
Distributes heavy keys across partitions.
✅ 3. Pre-filter before join
Instead of:
JOIN large_table
Do:
WITH filtered AS (
SELECT * FROM large_table WHERE date = '2026-01-01'
)
SELECT * FROM large_table WHERE date = '2026-01-01'
)
Reduce shuffle volume.
✅ 4. Broadcast Joins (Small Table Optimization)
If one table < ~100MB:
BigQuery broadcasts it
No shuffle required
This is ideal.
Comments (0)
No comments yet.
