Direct Lake in Microsoft Fabric | Analytic Endeavors

The Data Landscape -- Before OneLake

Data scattered across disconnected silos

The Challenge

Organizations often have data scattered across multiple disconnected systems -- each with its own format, security model, refresh schedule, and cost structure.

Azure Synapse

Analytics warehouse

Power BI

Business intelligence

ADLS Gen2

Data lake storage

Data Factory

ETL / orchestration

Databricks

Spark compute

Amazon S3

External cloud

Azure ML

Machine learning

Data Explorer

Log & time-series

SQL Server

On-prem database

Excel

Spreadsheets

SharePoint

Document storage

On-Prem DBs

Legacy systems

The Problem

Multiple disconnected platforms

Duplicated storage across systems

Separate security models per tool

Independent refresh schedules

Siloed access and governance

This is the problem OneLake solves. Instead of managing dozens of separate systems, OneLake unifies everything under a single storage layer with one security model, one copy of the data, and one governance framework.

OneLake -- The Unified Platform

One lake, multiple engines, all data

Foundation: OneLake

Storage

OneLake

Delta Lake Format One Copy of Data Shortcuts to External ADLS Gen2 Compatible

OneLake is like OneDrive for data -- one copy of the data, accessible by many tools. All Fabric workloads read from and write to OneLake in open Delta/Parquet format.

Compute Engines

Spark

T-SQL

KQL

Analysis Services

Storage

OneLake

Different compute engines can all query the same data in OneLake -- no copying needed.

Fabric Items

Items

Lakehouse

Warehouse

Notebooks

Pipelines

Reports

Real-Time Analytics

Compute Engines

Spark

T-SQL

KQL

Analysis Services

Storage

OneLake

Connection Modes

Four ways Power BI connects to your data

Import The traditional approach

Sources

Scheduled copy

Semantic Model

Report

Data is copied from sources at scheduled intervals. The ETL process can apply complex transformations. Queries are fast (in-memory), but data can be stale between refreshes.

1GB model limit in shared capacity, up to 400GB in Premium. Storage is duplicated -- source + import cache.

DirectQuery Live queries to source

Source DB

Live query

Report

Every interaction generates a query back to the source database. Data is always real-time, but query performance depends entirely on the source.

Best for: real-time dashboards with simple models. Can overwhelm source systems under heavy report usage.

Direct Lake Fabric-native

Sources

Fabric

OneLake

Delta tables

Live by default

Semantic Model

Optional scheduled refresh

Report

Best of both worlds. The semantic model reads only the columns it needs directly from OneLake Lakehouses or Warehouses. Data flows live by default -- a "refresh" simply updates the delta version pointer so the model sees the latest files.

Falls back to DirectQuery if data exceeds capacity memory limits. Schedule a refresh after data pipeline loads to ensure the model reflects the complete, consistent dataset.

Composite Mix of storage modes

Sources

Mixed

Semantic Model

Import DL DQ

Report

Different tables use different storage modes within the same model. Import small lookup tables for speed, use Direct Lake for large fact tables, add DirectQuery for real-time dimensions.

Also allows extending a Live Connection model with local Import or DQ tables. Adds complexity -- use when a single mode doesn't fit all tables.

Direct Lake Mode

Best of Both Worlds: Import Performance with DirectQuery Freshness

How It Works

Data in OneLake

Delta tables stored in Parquet format. Data is ingested via pipelines, Dataflows, Spark, or Shortcuts.

Transcoding on Demand

When a query arrives, Parquet data is converted to VertiPaq columnar format. Only needed columns are loaded.

Targeted Caching

Transcoded columns are cached in memory. Subsequent queries on the same columns run at import-mode speed.

Auto Refresh

When Delta tables change, the cache auto-invalidates. The next query triggers fresh transcoding -- data stays current. Schedule a refresh after incremental pipeline loads to ensure the model reflects the complete dataset.

Key Benefits

Fast Queries

Near-import performance via in-memory caching and on-demand transcoding.

Transcoding takes milliseconds per column -- cached data is served at full VertiPaq speed.

Always Fresh

No scheduled refresh needed -- data stays current as Delta tables update.

Can also be set to snapshot mode for consistent point-in-time reporting. Ideal for daily or weekly board-level reports that need a stable view.

No Duplication

Single copy in OneLake -- no import cache consuming extra storage and memory.

Eliminates the storage cost of maintaining a separate VertiPaq copy of your data.

Massive Scale

No model size limits imposed by import. Scales with your Fabric capacity.

Automatically falls back to DirectQuery for data exceeding memory, so queries never fail.

Direct Lake vs Import

Import Mode

Power BI

Import

OneLake

Requires Refresh

Direct Lake Mode Best of Both

Power BI

Import

OneLake

Always Fresh

Targeted Caching

OneLake Table

Sales

Region

Date

Notes

Amount

Status

Ref

Only queried columns

In-Memory Cache

Columns not in your query stay on disk. Cached columns run at full import speed.

Schedule a refresh after pipeline loads to pick up new delta versions.

Storage Mode Comparison

Import

Speed Fastest

Freshness Scheduled

Storage Duplicated

Refresh Required

Data In model

DirectQuery

Speed Slower

Freshness Real-time

Storage None

Refresh Not needed

Data At source

Direct Lake

Speed Near Import

Freshness Real-time

Storage None

Refresh Optional

Data OneLake

DirectQuery Variants

To Relational Source

SQL Server, Azure SQL, etc.

Report → SQL Database

Sends live T-SQL queries directly to the database. Speed depends on source optimization.

Speed = Source DB

Over Analysis Services

Chaining via published models

Report → Semantic Model

Chains to a published model, inheriting its storage mode without duplication.

Inherits Mode

Requirements

What you need to use Direct Lake

Delta Tables

Data must be in Delta/Parquet format in OneLake. This is the native storage format for Fabric Lakehouses and Warehouses.

Use Fabric notebooks, pipelines, or Dataflows Gen2 to convert existing data to Delta format.

Lakehouse or Warehouse

Delta tables must reside in a Fabric Lakehouse or Warehouse. Shortcuts to external Delta tables are also supported.

Shortcuts enable Direct Lake on data stored in ADLS Gen2 or S3 without moving it.

Fabric Capacity

A Microsoft Fabric capacity (F2 or higher) is required. Direct Lake is not available in shared/Pro-only workspaces.

F2 is the minimum SKU. Larger capacities allow more data to be cached in memory before fallback occurs.

V-Order Optimized

Tables should be V-Order optimized for best transcoding performance. Fabric applies V-Order by default.

V-Order pre-sorts data for faster VertiPaq transcoding. Run OPTIMIZE on existing tables.

All four requirements must be met. F2 SKU is the minimum Fabric capacity. Without Delta tables in OneLake, the semantic model cannot use Direct Lake mode and will fall back to DirectQuery or require Import.

**On screen:** Process card #1 -- "Data in OneLake" with Delta/Parquet description - This is the **foundation**: data lives as Delta tables in OneLake, written by pipelines, Dataflows, Spark, or Shortcuts - Emphasize the **open format** -- Parquet files with a Delta transaction log on top - The semantic model doesn't copy this data; it reads from it *Transition: "So the data is there -- what happens when someone opens a report?"*

**On screen:** Process card #2 -- "Transcoding on Demand" - This is the key differentiator: transcoding happens **at query time**, not upfront - Parquet columns are converted to **VertiPaq** columnar format on the fly - Only the **specific columns** the DAX query touches get transcoded -- not the whole table - This is why there's no traditional "refresh" -- no bulk import step *Key message: "On demand" is the magic phrase here.*

**On screen:** Process card #3 -- "Targeted Caching" - Once a column is transcoded, it's **cached in memory** - Subsequent queries hitting those same columns run at full import speed - The cache is column-level, not table-level -- granular and memory-efficient - This is why the first query is slightly slower, but everything after that is fast *Transition: "But what happens when the underlying data changes?"*

**On screen:** Process card #4 -- "Auto Refresh" - When Delta tables are updated, the cache **auto-invalidates** - The next query triggers fresh transcoding -- no manual intervention needed - **Pro tip**: schedule a semantic model refresh after pipeline loads to ensure the model picks up the latest delta version immediately - You *can* use snapshot mode for consistent point-in-time reporting (board reports, monthly closes) *Key message: Data freshness is automatic, not a scheduled chore.*

**On screen:** Benefit card -- "Always Fresh" (green, refresh icon) - No scheduled refresh needed for data currency -- Delta changes flow through automatically - Mention the footnote: **snapshot mode** is available for when you *want* a stable view (weekly board reports, audit periods) - This is the DirectQuery benefit without the DirectQuery performance penalty *Key message: You get real-time freshness without sacrificing query speed.*

**On screen:** Benefit card -- "No Duplication" (orange, database-with-X icon) - Single copy of data lives in OneLake -- the semantic model doesn't create a second copy - Eliminates the **storage cost** of maintaining a separate VertiPaq dataset - For large datasets, this savings is significant -- no more 400GB import models duplicating your lakehouse *Transition: "And it scales beyond what import can handle."*

**On screen:** Benefit card -- "Massive Scale" (purple, expand icon) - No model size limits imposed by import mode - Scales with your Fabric capacity SKU - **Fallback behavior**: if data exceeds available memory, Direct Lake automatically falls back to DirectQuery for those columns -- queries never fail - This is a safety net that import mode doesn't have *Key message: Direct Lake grows with your data -- you don't hit a wall.*

**On screen:** Side-by-side comparison -- Import Mode panel (left side, with "Requires Refresh" badge) - Walk through the Import flow: Power BI -> Import cache (red highlight) -> OneLake - That middle layer -- the import cache -- is the **problem**: it creates staleness and storage duplication - Data is only as fresh as the last scheduled refresh - Point out the red highlight on the Import box -- that's the bottleneck *Transition: "Now look at what Direct Lake does differently."*

**On screen:** Direct Lake Mode panel (right side, with bypass arrow and "Always Fresh" badge) - The animated dashed arrow **bypasses** the import cache entirely -- Power BI reads straight from OneLake - The crossed-out Import box shows what's been eliminated - "Best of Both" badge reinforces: import speed + DirectQuery freshness - Only the columns your current query needs are cached -- not the entire dataset *Key message: The visual makes it obvious -- Direct Lake removes the middleman.*

**On screen:** Targeted Caching diagram -- OneLake table columns with 3 highlighted, arrow to In-Memory Cache - This visual shows exactly how selective caching works: 8 columns in the table, only 3 (Sales, Date, Amount) are cached - The dimmed columns (ID, Region, Notes, Status, Ref) stay on disk -- no memory wasted - Cached columns run at **full import speed** - Footnote reminds: schedule a refresh after pipeline loads to pick up new delta versions *Key message: "Only what you need, when you need it" -- that's the efficiency of Direct Lake.*

**On screen:** Import mode feature card (orange) -- Speed: Fastest, Freshness: Scheduled, Storage: Duplicated, Refresh: Required - Import is the **baseline** everyone knows -- fastest queries, but at a cost - **Scheduled** freshness means data can be hours or days stale - **Duplicated** storage means you're paying for the same data twice - Refresh is **required** -- miss one and your dashboard is wrong *Transition: "What if you flip those trade-offs?"*

**On screen:** DirectQuery feature card (blue) -- Speed: Slower, Freshness: Real-time, Storage: None, Refresh: Not needed - DirectQuery solves the freshness problem but creates a **speed** problem - Every click sends a live query to the source -- performance depends on the source database - No storage duplication, no refresh needed -- but **complex models with many visuals can overwhelm the source** - Good for operational dashboards with low visual counts *Key message: DirectQuery trades speed for freshness -- the opposite of Import.*

**On screen:** Direct Lake feature card (teal, highlighted) -- Speed: Near Import, Freshness: Real-time, Storage: None, Refresh: Optional - This is the **punchline**: Direct Lake gets the best of both columns - **Near Import** speed -- first query slightly slower, subsequent queries match import - **Real-time** freshness like DirectQuery - **No storage duplication** -- single copy in OneLake - Refresh is **Optional** -- you *can* schedule it for consistency, but you don't *have* to *Key message: Point at the green checkmarks -- this is why Direct Lake matters.*

**On screen:** DirectQuery Variants panel -- Relational Source vs. Chaining over Analysis Services - Two types of DirectQuery that audiences often confuse: - **Relational DQ**: Report sends live SQL to a database (SQL Server, Azure SQL) -- speed = source DB performance - **Chaining DQ**: Report connects to a published semantic model, inheriting its storage mode -- no data duplication - In **composite models**, both types can coexist: import lookup tables for speed + DQ fact tables for freshness *Key message: Chaining is how teams share a single source of truth without copying data.*

**On screen:** Checklist item -- "Delta Tables" with gold checkmark - **Hard requirement #1**: data must be in Delta/Parquet format in OneLake - This is the native format for Fabric Lakehouses and Warehouses - If your data isn't Delta yet, use notebooks, pipelines, or Dataflows Gen2 to convert - Shortcuts to external Delta tables (ADLS Gen2, S3) also count *Transition: "Where do those Delta tables need to live?"*

**On screen:** Checklist item -- "Lakehouse or Warehouse" with gold checkmark - **Hard requirement #2**: Delta tables must reside in a Fabric Lakehouse or Warehouse - **Shortcuts** are a powerful option -- point to Delta tables in ADLS Gen2 or S3 without moving them - This means you don't have to migrate everything into Fabric storage to use Direct Lake *Transition: "What infrastructure do you need?"*

**On screen:** Checklist item -- "Fabric Capacity" with gold checkmark - **Hard requirement #3**: Microsoft Fabric capacity, **F2 minimum** - Direct Lake is **not available** in shared or Pro-only workspaces - Larger SKUs allow more data to be cached in memory before fallback to DirectQuery occurs - This is often the blocker for smaller organizations -- plan for the capacity cost *Transition: "One more optimization to get the best performance."*

**On screen:** Checklist item -- "V-Order Optimized" with gold checkmark - **Hard requirement #4**: tables should be V-Order optimized for best transcoding performance - V-Order **pre-sorts** data in a way that aligns with VertiPaq's columnar compression - Fabric applies V-Order by default on new tables -- but for existing tables, run **OPTIMIZE** - Without V-Order, transcoding still works but takes longer -- noticeable on large tables *Key message: V-Order is the difference between "near import" and "noticeably slower."*

**On screen:** Summary callout -- "All four requirements must be met" with F2 minimum and fallback explanation - Reinforce: **all four** are hard requirements, not optional - Without Delta tables in OneLake, the semantic model **cannot** use Direct Lake mode - Fallback behavior: the model drops to DirectQuery (slower) or requires Import (back to square one) - Good closing question: *"Which of these four does your organization already have in place?"* *Key message: This is a checklist -- if you can check all four, you're ready for Direct Lake.*