Hi r/analytics,

If you evaluated ClickHouse a few years ago, you were likely told to avoid JOINs at all costs. The standard "tribal knowledge" was to denormalize everything into massive, flat tables.

In 2020, that criticism was fair. ClickHouse had a single hash join algorithm. If your right-side table exceeded memory, the query just crashed.

That advice is now officially outdated. I recently dug into the commit history. Between 2022 and 2026, the engineering team merged over 50 pull requests that dismantled almost every limitation of the join engine. ClickHouse now has the planning sophistication of a mature RDBMS operating inside a vectorized model.

I mapped out the most impactful changes shipping by default today:

Grace Hash Join: Inactive buckets now spill to disk. OOM crashes for memory-bound joins are completely solved.
Equivalence-set Filter Pushdown: Filters now propagate across join sides automatically to prune data before it reaches the join.
Global Join Reordering: It finally uses column statistics and dynamic programming to find the optimal join order automatically.
Runtime Bloom Filters: Dynamically prunes probe-side data at the storage scan level.
Parallel Hash Join: This scales linearly across CPU cores and is now the default execution mode.

The Verdict:

Denormalization still makes sense if you have hyper-strict sub-10ms p99 latency targets. But for standard star schema workloads, you no longer need to flatten your data.

Are you still building wide flat tables out of habit/fear, or have you started moving back to normalized joins in ClickHouse?

Hi SQL folks,

If you evaluated ClickHouse a few years ago, you were likely told to avoid JOINs at all costs. The standard industry advice was to denormalize everything into massive, flat tables.

In 2020, that criticism was fair. ClickHouse had a single hash join algorithm; if your right-side table exceeded memory, the query simply crashed. That advice is now officially outdated.

I recently dug into the commit history, and between 2022 and 2026, the engineering team merged over 50 pull requests that dismantled almost every limitation of the join engine. ClickHouse now has the planning sophistication of a mature RDBMS operating inside a columnar vectorized model.

5 Impactful Changes Shipping by Default Today:

Grace Hash Join: Inactive buckets now spill to disk. OOM (Out of Memory) crashes for memory-bound joins are completely solved.
Equivalence-set Filter Pushdown: Filters propagate across join sides automatically to prune data before it even reaches the join stage.
Global Join Reordering: Uses column statistics and dynamic programming to find the optimal join order automatically.
Runtime Bloom Filters: Dynamically prunes probe-side data at the storage scan level, significantly reducing I/O.
Parallel Hash Join: This scales linearly across CPU cores and is now the default execution mode.

The Bottom Line

Denormalization still makes sense if you have strict sub-10ms p99 latency targets. However, for standard star schema workloads, you no longer need to flatten your data and deal with the associated ETL complexity.

I wrote a full PR-by-PR analysis of this evolution here: https://dataanalyticsguide.substack.com/p/clickhouse-native-json-support-2026

Are you still building wide flat tables out of habit, or have you moved back to normalized joins in ClickHouse recently?

u/FarRub2855

5 Impactful Changes Shipping by Default Today:

The Bottom Line