Delta Format

Column Store

How analytical data is stored and compressed

Overview

This section explains how analytical data is physically stored — column by column rather than row by row — and the compression techniques that make it fast and efficient.

Column Store

Every Delta table stores its data column by column — not row by row — so analytical queries scan only the values they need.

Run Length Encoding

Instead of storing every repeated value, RLE records each distinct value once alongside how many times it appears — turning repetitive columns into tiny pairs.

Values

3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 5, 5, 5, 5, 5

Described

Ten 3s, Five 5s

Encoded

(10, 3) (5, 5)

Dictionary Encoding

Dictionary encoding replaces repeated string values with compact integer IDs — dramatically reducing the storage footprint of text-heavy columns.

Snappy Compression

Parquet files can use codecs when saving. Fabric uses Snappy by default, a fast, lightweight codec.

Delta

Parquet files plus a transaction log

Overview

This section covers the Parquet file format and how Delta builds on top of it — adding a transaction log that enables versioning, ACID guarantees, and time travel for every table in OneLake.

Parquet

Parquet is the open-source columnar file format underlying every Delta table — it organizes data by column and compresses each one independently for fast, efficient analytics.

Parquet Format

Parquet is an open-source, hybrid columnar format built for analytical workloads — optimized for compression and fast reads, but designed for append-only writes.

Open-source
format

Hybrid column
based

Great compression
available

Better at OLAP
vs OLTP

Cannot edit easy –
only insert new files

Delta

Delta adds a transaction log on top of Parquet files, giving every table in OneLake versioning, ACID guarantees, and time travel.

Delta Format

Delta is Parquet with a transaction log — every write is recorded as a versioned commit, adding ACID guarantees, time travel, and lineage tracking on top of the columnar format.

Provides ACID
functionality

Allows
'Time Travel'

Data Lineage and
Debugging

Data
Optimization

Optimizations

Write-time and storage-level performance tuning

Overview

This section covers the optimization techniques that keep Delta tables fast — from file skipping and vacuum to V-Order and compression strategies applied at write time and storage level.

File Skipping

Delta stores min/max statistics for every Parquet file in the transaction log — so queries can skip entire files that can't contain matching rows, dramatically reducing the data scanned.

Vacuum

Vacuum removes outdated Parquet files that are no longer referenced by the transaction log — reclaiming storage and keeping tables lean.

V-Order

V-Order is a write-time optimization to the Parquet file format that enables fast reads under the Microsoft Fabric compute engines, such as Power BI, SQL, Spark, and others.

Orders data efficiently when writing (saving), so that it can be read faster.

Optimizations

Write File

RLE, Dictionary & Other Algorithms

Auto optimizations based on the open-source column store format.

V-Order

Optional algorithm for Microsoft-specific tools.

Compression

Choice on how to save file (e.g. Snappy, Gzip).

Optimize Storage

Optimize & Z-Order

Reconstruct files to better compress.

Vacuum

Remove old files to free storage over time.

Direct Lake

Import-speed analytics with live OneLake data

Overview

This section introduces Direct Lake — how Power BI reads Delta tables straight from OneLake into the analysis engine, giving you import-mode speed without data copying or scheduled refreshes.

What is a Semantic Model

A semantic model is the analytical layer that sits between raw data and the end user — it combines a copy of the data with metadata (TMDL) that defines relationships, measures, and business logic.

Semantic Model

contains

Data

A cached copy of the
source data

+

Metadata

Relationships, measures,
and business logic (TMDL)

Direct Lake

Direct Lake reads Delta tables straight from OneLake into the analysis engine — no data copying, no scheduled refreshes — giving you import-mode speed with live data. Read the full Direct Lake guide →

Architecture

How it all fits together

Overview

This section brings it all together — how Bronze, Silver, Gold, and Semantic layers organize your data from raw ingestion to business-ready analytics.

Medallion Architecture

Bronze

Raw Data

File Format:

Parquet

How it is Stored:

Parquet

Silver

Clean Data

File Format:

Delta Tables

How it is Stored:

Parquet

+

delta log
(metadata)

Gold

Curated Data

File Format:

Delta Tables

How it is Stored:

Parquet

+

delta log
(metadata)

Semantic Layer

Business Logic

File Format:

Semantic Model

How it is Stored:

(direct lake)

Parquet

+

delta log
(metadata)

+

TMDL
(metadata)

Data Warehouse

Column Store

Delta

Optimizations

Direct Lake

Architecture

Need help designing your Delta Lake architecture?

Discussion