How data is stored, compressed, and optimized in Microsoft Fabric
Column Store • Delta Tables • Optimizations • Direct Lake
How analytical data is stored and compressed
This section explains how analytical data is physically stored — column by column rather than row by row — and the compression techniques that make it fast and efficient.
Every Delta table stores its data column by column — not row by row — so analytical queries scan only the values they need.
Instead of storing every repeated value, RLE records each distinct value once alongside how many times it appears — turning repetitive columns into tiny pairs.
Dictionary encoding replaces repeated string values with compact integer IDs — dramatically reducing the storage footprint of text-heavy columns.
Parquet files can use codecs when saving. Fabric uses Snappy by default, a fast, lightweight codec.
Parquet files plus a transaction log
This section covers the Parquet file format and how Delta builds on top of it — adding a transaction log that enables versioning, ACID guarantees, and time travel for every table in OneLake.
Parquet is the open-source columnar file format underlying every Delta table — it organizes data by column and compresses each one independently for fast, efficient analytics.
Parquet is an open-source, hybrid columnar format built for analytical workloads — optimized for compression and fast reads, but designed for append-only writes.
Delta adds a transaction log on top of Parquet files, giving every table in OneLake versioning, ACID guarantees, and time travel.
Delta is Parquet with a transaction log — every write is recorded as a versioned commit, adding ACID guarantees, time travel, and lineage tracking on top of the columnar format.
Write-time and storage-level performance tuning
This section covers the optimization techniques that keep Delta tables fast — from file skipping and vacuum to V-Order and compression strategies applied at write time and storage level.
Delta stores min/max statistics for every Parquet file in the transaction log — so queries can skip entire files that can't contain matching rows, dramatically reducing the data scanned.
Vacuum removes outdated Parquet files that are no longer referenced by the transaction log — reclaiming storage and keeping tables lean.
Import-speed analytics with live OneLake data
This section introduces Direct Lake — how Power BI reads Delta tables straight from OneLake into the analysis engine, giving you import-mode speed without data copying or scheduled refreshes.
A semantic model is the analytical layer that sits between raw data and the end user — it combines a copy of the data with metadata (TMDL) that defines relationships, measures, and business logic.
Direct Lake reads Delta tables straight from OneLake into the analysis engine — no data copying, no scheduled refreshes — giving you import-mode speed with live data. Read the full Direct Lake guide →
How it all fits together
This section brings it all together — how Bronze, Silver, Gold, and Semantic layers organize your data from raw ingestion to business-ready analytics.