BigQuery columnar storage
How BigQuery Internally Stores Column Chunks

BigQuery uses columnar storage inspired by Dremel.
? Storage Model
Each table is:
Column-oriented
Split into storage blocks
Compressed
Distributed
? Column Chunks
Each column:
Stored separately
Broken into chunks
Metadata tracks min/max values
This enables:
Column pruning
Predicate pushdown
Partition pruning
? Why SELECT * Is Expensive
Because:
BigQuery reads every column file
Even unused ones
Increases IO
Better:
? Nested & Repeated Fields
BigQuery stores nested fields in a flattened columnar structure using repetition/definition levels.
Benefits:
Avoids joins
Reduces shuffle
Improves performance
Denormalization is encouraged for this reason.
Comments (0)
No comments yet.
