~30 min read

Delta Format

How data is stored, compressed, and optimized in Microsoft Fabric

Column Store • Delta Tables • Optimizations • Direct Lake

Scroll to explore

Column Store

How analytical data is stored and compressed

Overview

This section explains how analytical data is physically stored — column by column rather than row by row — and the compression techniques that make it fast and efficient.

Column Store

Every Delta table stores its data column by column — not row by row — so analytical queries scan only the values they need.

Run Length Encoding

Instead of storing every repeated value, RLE records each distinct value once alongside how many times it appears — turning repetitive columns into tiny pairs.

Values
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 5
Described
Ten 3s,  Five 5s
Encoded
(10, 3) (5, 5)
Dictionary Encoding

Dictionary encoding replaces repeated string values with compact integer IDs — dramatically reducing the storage footprint of text-heavy columns.

Snappy Compression

Parquet files can use codecs when saving. Fabric uses Snappy by default, a fast, lightweight codec.

Delta

Parquet files plus a transaction log

Overview

This section covers the Parquet file format and how Delta builds on top of it — adding a transaction log that enables versioning, ACID guarantees, and time travel for every table in OneLake.

Parquet

Parquet is the open-source columnar file format underlying every Delta table — it organizes data by column and compresses each one independently for fast, efficient analytics.

Parquet Format

Parquet is an open-source, hybrid columnar format built for analytical workloads — optimized for compression and fast reads, but designed for append-only writes.

Open-source
format
Hybrid column
based
Great compression
available
Better at OLAP
vs OLTP
Cannot edit easy –
only insert new files
Delta

Delta adds a transaction log on top of Parquet files, giving every table in OneLake versioning, ACID guarantees, and time travel.

Delta Format

Delta is Parquet with a transaction log — every write is recorded as a versioned commit, adding ACID guarantees, time travel, and lineage tracking on top of the columnar format.

Provides ACID
functionality
Allows
'Time Travel'
Data Lineage and
Debugging
Data
Optimization

Optimizations

Write-time and storage-level performance tuning

Overview

This section covers the optimization techniques that keep Delta tables fast — from file skipping and vacuum to V-Order and compression strategies applied at write time and storage level.

File Skipping

Delta stores min/max statistics for every Parquet file in the transaction log — so queries can skip entire files that can't contain matching rows, dramatically reducing the data scanned.

Vacuum

Vacuum removes outdated Parquet files that are no longer referenced by the transaction log — reclaiming storage and keeping tables lean.

V-Order
V-Order is a write-time optimization to the Parquet file format that enables fast reads under the Microsoft Fabric compute engines, such as Power BI, SQL, Spark, and others.
Orders data efficiently when writing (saving), so that it can be read faster.
Optimizations
Write File
RLE, Dictionary & Other Algorithms
Auto optimizations based on the open-source column store format.
V-Order
Optional algorithm for Microsoft-specific tools.
Compression
Choice on how to save file (e.g. Snappy, Gzip).
Optimize Storage
Optimize & Z-Order
Reconstruct files to better compress.
Vacuum
Remove old files to free storage over time.

Direct Lake

Import-speed analytics with live OneLake data

Overview

This section introduces Direct Lake — how Power BI reads Delta tables straight from OneLake into the analysis engine, giving you import-mode speed without data copying or scheduled refreshes.

What is a Semantic Model

A semantic model is the analytical layer that sits between raw data and the end user — it combines a copy of the data with metadata (TMDL) that defines relationships, measures, and business logic.

Semantic Model
contains
Data
A cached copy of the
source data
+
TMDL
Metadata
Relationships, measures,
and business logic (TMDL)
Direct Lake

Direct Lake reads Delta tables straight from OneLake into the analysis engine — no data copying, no scheduled refreshes — giving you import-mode speed with live data. Read the full Direct Lake guide →

Architecture

How it all fits together

Overview

This section brings it all together — how Bronze, Silver, Gold, and Semantic layers organize your data from raw ingestion to business-ready analytics.

Medallion Architecture
Bronze
Raw Data
File Format:
Parquet
How it is Stored:
Parquet
Silver
Clean Data
File Format:
Delta Tables
How it is Stored:
Parquet
+
delta log
(metadata)
Gold
Curated Data
File Format:
Delta Tables
How it is Stored:
Parquet
+
delta log
(metadata)
Semantic Layer
Business Logic
File Format:
Semantic Model
How it is Stored:
(direct lake)
Parquet
+
delta log
(metadata)
+
TMDL
(metadata)
Data Warehouse
Scooby-Doo meme: Fred (Fabric) unmasking the Data Warehouse ghost to reveal Text Files underneath
1 / 6