GCP Data Architect Series - Part II

Dimensional modeling in BigQuery follows classic data warehouse principles (Kimball-style) — but you optimize for cloud-native, columnar, massively parallel processing.

AdminFollow

5 min•Feb 28, 2026

Views - 24

1️⃣ Start With the Business, Not the Tables

Before touching BigQuery:

Identify business processes (orders, payments, shipments, marketing events, etc.)
Define grain clearly
“One row represents what?”

Example:

Orders fact → 1 row per order line
Web events fact → 1 row per event

If grain is unclear, everything downstream breaks.

2️⃣ Design the Star Schema

In BigQuery, I strongly favor star schemas over snowflake schemas because:

Joins are cheap in BigQuery
Storage is inexpensive
Simpler models perform well
BI tools work better with stars

Typical Structure

Fact tables → metrics + foreign keys
Dimension tables → descriptive attributes

Example:

fact_orders
  - order_id
  - order_date_key
  - customer_key
  - product_key
  - quantity
  - revenue

dim_customer
dim_product
dim_date

3️⃣ Fact Table Design in BigQuery

Best Practices

✔ Define Grain Explicitly

Never mix grains in the same fact table.

✔ Use Surrogate Keys (When Needed)

BigQuery doesn’t require integer keys for performance, but they help:

With Slowly Changing Dimensions
With model stability
With BI tool consistency

✔ Partition Large Fact Tables

Use:

PARTITION BY DATE(order_timestamp)
or
PARTITION BY order_date

This reduces scan costs significantly.

✔ Cluster Smartly

Cluster by:

Foreign keys
Frequently filtered columns

Example:

PARTITION BY order_date
CLUSTER BY customer_key, product_key

4️⃣ Dimension Table Design

Dimensions contain descriptive context.

Example:

Customer
Product
Location
Campaign
Sales rep

Slowly Changing Dimensions (SCD)

In BigQuery, I commonly implement:

? Type 1 (Overwrite)

For non-critical history.

? Type 2 (History Tracking)

Add:

effective_start_date
effective_end_date
is_current_flag

Because storage is cheap in BigQuery, Type 2 is very common.

5️⃣ Use Denormalization Strategically

BigQuery is columnar. So:

Wide tables are okay
Denormalized dimensions perform well
Avoid over-normalizing small dimensions

But:

Don’t duplicate large high-cardinality fields unnecessarily.

6️⃣ Handle BigQuery-Specific Considerations

? Nested & Repeated Fields

BigQuery supports nested data.

When to use:

JSON ingestion
Event-level semi-structured data

When NOT to use:

Core dimensional model exposed to BI users
→ Flatten for analytics layer

? Storage vs Compute Model

BigQuery charges per data scanned.

So:

Partition aggressively
Filter on partition columns
Avoid SELECT *

7️⃣ Layered Architecture Approach

I recommend 3 logical layers:

1️⃣ Raw / Bronze

Exact copy of source
Append-only
No transformations

2️⃣ Staging / Silver

Cleaned
Conformed data
Standardized formats
Surrogate keys added

3️⃣ Presentation / Gold

Star schemas
Business-friendly names
Ready for BI tools

8️⃣ Conformed Dimensions

If multiple fact tables exist:

Ensure shared dimensions use same keys
Centralize common attributes
Maintain a single source for dimensions like:
- customer
- date
- product

This prevents metric inconsistencies.

9️⃣ Testing & Governance

In BigQuery, I implement:

Row count tests
Null checks on foreign keys
SCD validity checks
Referential integrity validation

Tools often used:

dbt
Dataform
Great Expectations

? Performance Optimization Patterns

✔ Materialized Views

For aggregated facts.

✔ Aggregate Fact Tables

For dashboard speed.

Example:

fact_orders_daily
fact_orders_monthly

✔ Precompute Heavy Metrics

Instead of calculating at query time.

1️⃣1️⃣ Handling Very Large Event Data

For event-heavy workloads:

Partition by event_date
Cluster by user_id
Store raw events nested
Build flattened fact table for BI

1️⃣2️⃣ Anti-Patterns to Avoid

? Snowflaking everything
? Mixing multiple grains in one fact
? Using natural keys for SCD Type 2
? No partitioning on billion-row tables
? Letting BI users query raw layer

Example Real-World Pattern

E-commerce model:

Facts

fact_orders
fact_payments
fact_shipments
fact_web_events

Dimensions

dim_customer (SCD2)
dim_product
dim_date
dim_campaign
dim_location

Final Philosophy

In BigQuery:

Simplicity + Partitioning + Clear Grain > Complex Normalization

Design for:

Query cost efficiency
Scalability
BI usability
Maintainability

Comments (0)

No comments yet.

GCP Data Architect Series - Part II

Dimensional modeling in BigQuery follows classic data warehouse principles (Kimball-style) — but you optimize for cloud-native, columnar, massively parallel processing.

1️⃣ Start With the Business, Not the Tables

2️⃣ Design the Star Schema

Typical Structure

3️⃣ Fact Table Design in BigQuery

Best Practices

✔ Define Grain Explicitly

✔ Use Surrogate Keys (When Needed)

✔ Partition Large Fact Tables

✔ Cluster Smartly

4️⃣ Dimension Table Design

Slowly Changing Dimensions (SCD)

? Type 1 (Overwrite)

? Type 2 (History Tracking)

5️⃣ Use Denormalization Strategically

6️⃣ Handle BigQuery-Specific Considerations

? Nested & Repeated Fields

? Storage vs Compute Model

7️⃣ Layered Architecture Approach

1️⃣ Raw / Bronze

2️⃣ Staging / Silver

3️⃣ Presentation / Gold

8️⃣ Conformed Dimensions

9️⃣ Testing & Governance

? Performance Optimization Patterns

✔ Materialized Views

✔ Aggregate Fact Tables

✔ Precompute Heavy Metrics

1️⃣1️⃣ Handling Very Large Event Data

1️⃣2️⃣ Anti-Patterns to Avoid

Example Real-World Pattern

Final Philosophy

Comments (0)

Course

Service

Get In Touch

Technical Skills

Analytical Skills

Business Skills

Career Resources