logo

Google BQ Join

How to Design Join Keys to Avoid Skew

AdminFollow
5 minFeb 28, 2026
Views - 14
Google BQ Join

? What Is Join Skew?

Example:

 
JOIN users u
ON e.country = u.country
 

If:

  • 60% of rows = "US"

  • All US rows hash to one partition

That slot:

  • Becomes hotspot

  • Runs 10x slower than others

Entire stage waits for slowest slot.


? Why Skew Is Worse at Scale

At 100TB scale:

  • Even 5% skew is huge

  • Shuffle network IO explodes

  • Memory pressure increases


? Anti-Skew Design Strategies

✅ 1. Join on high-cardinality keys

Good:

 
ON e.user_id = u.user_id
 

Bad:

 
ON e.country = u.country
 

✅ 2. Salting Technique

Add artificial randomness:

 
CONCAT(user_id, CAST(FARM_FINGERPRINT(event_id) % 10 AS STRING))
 

Distributes heavy keys across partitions.


✅ 3. Pre-filter before join

Instead of:

 
JOIN large_table
 

Do:

 
WITH filtered AS (
  SELECT * FROM large_table WHERE date = '2026-01-01'
)
 

Reduce shuffle volume.


✅ 4. Broadcast Joins (Small Table Optimization)

If one table < ~100MB:

  • BigQuery broadcasts it

  • No shuffle required

This is ideal.

Comments (0)

No comments yet.

© Copyright 2024. All Rights Reserved by Learningdhara Community LLP