Skip to content

Data Modeling for Analytics: Optimize for Queries, Not Transactions

Published: at 03:00 PM

OLTP normalized model vs. OLAP denormalized model side by side

The data model that runs your production application is almost never the right model for analytics. Transactional systems are designed for fast writes — inserting orders, updating inventory, processing payments. Analytics systems are designed for fast reads — scanning millions of rows, aggregating across dimensions, filtering by date ranges.

Using a transactional model for analytics is like using a filing cabinet when you need a search engine. The data is there, but finding answers takes too long.

Transactions vs. Analytics: Two Different Problems

Transactional (OLTP) workloads process many small operations: insert one order, update one account balance, delete one expired session. These models are normalized to Third Normal Form (3NF) or beyond — every piece of data stored once, redundancy eliminated, consistency enforced through constraints.

Analytical (OLAP) workloads process few large operations: scan all orders for the last year, aggregate revenue by region and product category, calculate year-over-year growth. These models are denormalized — data is pre-joined, attributes are flattened, and the structure is optimized for scans rather than updates.

AspectOLTP ModelOLAP Model
Optimization targetWrite speedRead speed
Normalization3NF or higherDenormalized
Table structureNarrow and manyWide and few
Joins per queryMany (10-20)Few (3-5)
Storage formatRow-orientedColumnar
Typical queryUPDATE one rowSUM across millions

Why Normalized Models Slow Down Analytics

A normalized 3NF model might have 15 tables involved in answering “What was revenue by product category by month?” The query engine must join orders to order_items to products to categories to dates, applying filters and aggregations across each join.

Chain of joins through normalized tables versus one wide scan through a denormalized table

Each join adds latency. Each join also adds a point of failure — wrong join condition, missing foreign key, ambiguous column name. An AI agent generating SQL against a 15-table normalized model has far more opportunities to make a mistake than against a 4-table star schema.

The fix is not to abandon normalization. Keep your OLTP model normalized for your application. But create a separate analytical model — denormalized, structured for queries, with pre-built joins and business-friendly column names — for reporting and analytics.

Designing for Read Performance

Analytical data models follow several patterns that optimize for read performance:

Wide tables reduce joins. Instead of orders → customers → addresses → cities → states, create a single fact_orders view with customer_name, customer_city, customer_state included. Every join you eliminate saves query time and reduces complexity.

Pre-computed columns reduce repeated calculations. If every report calculates quantity * unit_price * (1 - discount) as “net revenue,” compute it once in the model and expose it as a column. This eliminates repeated formula definitions and ensures consistency.

Consistent naming improves discoverability. Use order_date instead of dt. Use customer_email instead of email. When column names are self-explanatory, analysts find the right data faster, and AI agents generate more accurate SQL.

Date dimensions enable time-based analysis. A date dimension with fiscal_quarter, is_weekend, is_holiday, and week_of_year makes time-based filtering trivial. Without it, every analyst writes a different CASE WHEN MONTH(date) IN (1,2,3) THEN 'Q1' expression.

Pre-Aggregation and Summary Tables

Not every query needs to scan raw data. For frequently run aggregations, pre-aggregated summary tables reduce query time from minutes to milliseconds.

Common patterns:

The tradeoff is maintenance. Every summary table needs a refresh pipeline, and stale summaries produce outdated numbers.

Platforms like Dremio handle this automatically with Reflections — pre-computed aggregations and materializations that the query optimizer uses transparently. Users query the logical views; Dremio substitutes the fastest Reflection without the user knowing. No manual summary table management required.

Columnar Storage and Physical Layout

Analytics models benefit from columnar storage formats like Parquet:

Physical layout decisions that matter:

What to Do Next

Analytics model with wide tables, pre-aggregations, and columnar storage feeding dashboards

Find your slowest dashboard. Look at the queries behind it. Count the joins, measure the scan size, and check whether the model is normalized 3NF or denormalized for analytics. If it’s still using the transactional model, create an analytical view layer on top — a denormalized star schema with pre-computed columns, clear naming, and a date dimension. The dashboard performance improvement is usually immediate and significant.

Try Dremio Cloud free for 30 days