Event-Driven Near-Real-Time Global Lakehouse
Event-Driven Near-Real-Time Global Lakehouse

Now we combine streaming + lakehouse + warehouse.
🔹 Global Streaming Layer
Each region:
Real-time ingestion (Pub/Sub/Kafka equivalent)
Micro-batch landing tables
Partition by event_time
🔹 Micro-Batch Strategy
Avoid pure row-by-row streaming at 100PB scale.
Instead:
1–5 minute micro-batches
Merge into partitioned tables
Periodic compaction
Reduces storage fragmentation.
🔹 Near-Real-Time Serving Pattern
Raw stream → landing table
Transform → enriched table
Update aggregates incrementally
BI hits incremental aggregates
Target SLA:
2–10 minute freshness
Not sub-second
🔥 Global Coordination Pattern
Each region publishes:
Aggregated hourly metrics
Feature deltas
Business KPIs
Global reporting aggregates regional outputs.
Never stream raw globally.
Comments (0)
No comments yet.
