logo

BQ Event Architecture

Designing a 1PB Event Architecture

AdminFollow
5 minFeb 28, 2026
Views - 16
BQ Event Architecture

? Assumptions

  • 1PB raw events

  • Billions of events/day

  • Real-time + batch analytics

  • 100+ concurrent users


? Layered Architecture

? Layer 1: Raw Events

Table:

 
events_raw
 

Design:

  • Partition by event_date

  • Cluster by user_id

  • Nested schema

  • No joins required

Keep raw immutable.


? Layer 2: Enriched Events

Denormalize here.

Instead of:

 
JOIN users
JOIN products
JOIN campaigns
 

Flatten during ingestion.

Reduces shuffle later.


? Layer 3: Aggregation Tables

Create:

  • Daily user metrics

  • Session-level aggregates

  • Campaign performance rollups

BI never hits raw tables.


? Storage Strategy

At 1PB:

  • Partition by date (mandatory)

  • Consider multi-column clustering

  • Monitor partition size (avoid tiny partitions)


? Streaming vs Batch

Streaming:

  • Higher cost

  • Lower latency

  • Micro-partitioned storage

Batch load:

  • Cheaper

  • Better compression

For PB-scale:
→ Prefer batch ingestion where possible.


? Schema Design Principles

Use:

  • Nested/repeated fields

  • Avoid snowflake schema

  • Avoid small dimension joins

Denormalization reduces shuffle massively.


? Biggest Cost Driver at 1PB

Not storage.

It’s shuffle-heavy ad-hoc joins on raw data.

Comments (0)

No comments yet.

© Copyright 2024. All Rights Reserved by Learningdhara Community LLP