~14 min read

Fabric Engines & Items

Understanding the Building Blocks of Microsoft Fabric

From compute engines to data movers and storage -- the pieces that power your analytics

The Building Blocks of Fabric
Scroll to explore

Compute Engines

Four engines that power Microsoft Fabric

Fabric provides different compute engines for different workloads. Each engine is optimized for a specific type of processing, but they all read from the same unified storage layer -- OneLake.

Spark handles large-scale batch and streaming. T-SQL covers relational analytics. KQL targets time-series and log data. Analysis Services powers semantic modeling for Power BI.

Compute Engines
Spark
T-SQL
KQL
Analysis Services
Storage
OneLake

All engines query the same data in OneLake -- no copying needed.

Engine Profiles

Each engine is optimized for different workloads -- and every Fabric item runs on one of these under the hood.

Spark
Languages PySpark, Scala, R
Workflow Notebook-first
Best For Big data + ML
Powers
Lakehouse Notebook
T-SQL
Languages T-SQL
Workflow Schema-first
Best For BI + Analytics
Powers
Warehouse SQL Endpoint
KQL
Languages KQL
Workflow Real-time
Best For Streaming + Logs
Powers
Eventhouse
Analysis Services
Languages DAX
Workflow In-memory
Best For Semantic layer
Powers
Semantic Model

Data Movers

Three tools for moving and transforming data

Dataflow Gen2

Low-Code

Visual ETL using Power Query (M language). Drag-and-drop transforms -- no code required.

Interface Drag & Drop
Transforms Merge, Filter, Pivot
Best For Simple ETL

Pipeline

Orchestration

Workflow engine for multi-step data movement. Schedules, coordinates, and monitors jobs.

Interface Visual Canvas
Actions Copy, Branch, Loop
Best For Multi-Step Workflows

Notebook

Code-Driven

Code-first data engineering with full programmatic control. Supports Python, PySpark, Scala, and SparkSQL in interactive notebooks.

Interface Code Cells
Languages Python, Spark, SQL
Best For Complex Transforms
Each tool handles a different slice of the data movement problem.
Dataflow for visual transforms. Pipeline for orchestration. Notebook for code.

They're designed to work together -- a Pipeline can trigger a Notebook or Dataflow as a step in a larger workflow.
There's also Mirroring for continuous, real-time replication from external databases.

How They Overlap

Shared capabilities across data movers

Dataflow
Pipeline
Notebook
UI-Driven
Dataflow + Pipeline
Data Shaping
Dataflow + Notebook
Code-Driven
Pipeline + Notebook
Copies Data
All Three
Orchestration
Pipeline Only
Dataflow
Pipeline
Notebook
UI-Driven
Code-Driven
Data Shaping
Copies Data
Orchestration
The overlaps are intentional -- Microsoft designed these tools to share capabilities so teams can pick the mover that fits their skill set without losing functionality. Pipeline stands alone with orchestration because that's its primary job: coordinating the other two.

Shortcuts

Reference external data without copying it

ADLS Gen2
Azure Data Lake Storage
AWS S3
Amazon S3 Buckets
Other Workspaces
Fabric Items + More
OneLake
Shortcut
Zero-copy reference

Shortcut

Data Movement None
Freshness Always Current
Storage Cost None
Access Read-Only
Pointer
vs

Copy

Data Movement Full Transfer
Freshness As of Last Run
Storage Cost Duplicated
Access Read + Write
Full Control
Shortcuts are ideal when data already lives in a well-managed source. Copy when you need to transform, enrich, or own the data lifecycle.

ACID

Why traditional data lakes break -- and how Delta fixes it

Traditional Lake

Partial Writes Possible
Read Consistency Unstable
Rollback None
Versioning Manual
Pre-Delta
vs

Delta Lake

Partial Writes Prevented
Read Consistency Guaranteed
Rollback Built-in
Versioning Built-in
ACID Compliant

Delta Lake means you no longer have to choose between fresh data and trustworthy data.

A
Atomicity
All or nothing -- transactions complete fully or not at all
C
Consistency
Data always moves from one valid state to another
I
Isolation
Concurrent operations don't interfere with each other
D
Durability
Once committed, data persists even through failures
Delta Lake brings warehouse-grade reliability to the Lakehouse.
No more choosing between fresh data and trustworthy data -- ACID compliance means you get both.

Copy Data Activity

Where your data lands depends on the tool

Dataflow
Transform
Tables (Delta)
Pipeline
Files
Tables (Delta)
Notebook
Transform
Files
Tables (Delta)
Dataflows load transformed data into Delta tables.
Pipelines copy raw data as files or tables.
Notebooks can transform and write to either destination.
Delta Tables
ACID Yes
Queryable SQL + Spark
Best For Analytics
Files
ACID No
Queryable Limited
Best For Staging

Lakehouse

The best of both worlds

Data Lake
Any File Format
Schema on Read
Cost Effective
Weak Governance
Data Warehouse
Structured Tables
Schema on Write
Fast Queries
Rigid & Costly
Data Lakehouse
Any Format
ACID
Fast Queries
Low Cost

How the Lakehouse Works

1

Delta Lake Storage

Your tables are files -- open-format Parquet in OneLake, readable by any engine.

2

Auto SQL Endpoint

Run T-SQL queries against Delta tables -- auto-generated for every Lakehouse.

3

Dual Engines

Spark + T-SQL both reading the same underlying data. Pick your tool.

4

Zero Data Copies

One storage layer, multiple engines. No ETL between lake and warehouse.

i
One storage format, multiple access patterns. That's the Lakehouse promise.

Lakehouse vs Warehouse

Same storage, different write patterns

Lakehouse

Write Engine Spark
Read Engine Spark + Auto SQL
Storage Files + Tables
Schema On Read (Flexible)
Languages Python, PySpark, Scala
Code-Driven
vs

Warehouse

Write Engine T-SQL
Read Engine T-SQL
Storage Tables Only
Schema On Write (Strict)
Languages Stored Procs, Views, DDL
SQL-Driven
OneLake Storage
Delta / Parquet -- one copy, multiple engines
Choose Lakehouse When
Code-first workflows
Raw files + structured tables
ML, data science, experimentation
Choose Warehouse When
SQL-first teams with T-SQL
Strict governance + auditing
Traditional BI + dashboards
Either works for Power BI. Both support Direct Lake. And every Lakehouse auto-generates a SQL Analytics Endpoint, so SQL queries work in both -- pick whichever matches your team.

From Storage to Report

Completing the data journey

The Data Journey
OneLake
Semantic Model
Power BI Report
In Fabric, Direct Lake mode connects OneLake to Power BI without copies or compromises. It's the payoff of the entire Delta / OneLake architecture. For a deep dive, see our Direct Lake guide.
Title slide. Welcome the audience, introduce the guide topic. *"Today we're looking at the building blocks of Microsoft Fabric -- the engines, the data movers, and the storage items that tie it all together."*
**On screen:** Intro text naming the four engines + stacked architecture diagram (Compute Engines frame above OneLake Storage frame). - Set the frame: no single engine can handle batch, streaming, relational, AND in-memory BI - Fabric gives you **one engine per workload**, all sharing OneLake storage - Point out the diagram: **Compute Engines** frame on top, **Storage** frame below - Emphasize: all four engines read from the **same OneLake data** -- no ETL between them *Transition: "Let's look at each engine's profile."*
**On screen:** Spark profile card (orange) -- Languages, Workflow, Best For, Powers badges. - **Apache Spark**: large-scale batch processing and streaming - Languages: Python, PySpark, Scala, SparkSQL, R - Powers the **Lakehouse** and **Notebook** items - Best for: big data transforms, ML pipelines, ad-hoc data exploration *Transition: "Next, the relational engine."*
**On screen:** T-SQL profile card (teal) -- the relational analytics engine. - Familiar **SQL Server** syntax: SELECT, JOIN, stored procs, views - Powers the **Warehouse** and **SQL Analytics Endpoint** - Best for: structured queries, governed analytics, teams with SQL experience *Transition: "Now the real-time engine."*
**On screen:** KQL profile card (blue) -- time-series and log analytics. - **Kusto Query Language**: purpose-built for fast queries over streaming and time-series data - Powers the **Eventhouse** (formerly KQL Database) - Best for: IoT telemetry, application logs, real-time dashboards *Transition: "And the last engine -- the one closest to Power BI."*
**On screen:** Analysis Services profile card (purple) -- the semantic modeling engine. - The **VertiPaq** in-memory engine that powers every Power BI semantic model - DAX measures, relationships, calculation groups -- all run here - Powers the **Semantic Model** and **Power BI Report** - *Audience check: "Which of these engines does your team use most today?"* *Key message: every Fabric item runs on one of these four engines under the hood.*
**On screen:** Dataflow Gen2 card -- teal, "Low-Code" badge. - **Power Query Online** -- same drag-and-drop as Power BI Desktop, now cloud-native - Interface: visual transforms (merge, filter, pivot). No code required. - Best for: **straightforward ETL** where a business analyst can self-serve *Transition: "Next, the orchestrator."*
**On screen:** Pipeline card -- blue, "Orchestration" badge. - Inherited from **Azure Data Factory** -- visual canvas for multi-step workflows - Copy Activity moves data from 90+ sources. ForEach, If-Else, Switch for branching. - Best for: **scheduling and coordinating** other tools (trigger a Notebook, chain a Dataflow) *Transition: "And for full programmatic control..."*
**On screen:** Notebook card -- orange, "Code-Driven" badge. - Interactive cells: **PySpark, Scala, SparkSQL, Python** - Full library access, ML frameworks, visualizations in-cell - Best for: **complex transforms**, ML feature engineering, ad-hoc exploration *Transition: "So when do you pick which tool?"*
**On screen:** Summary callout -- tools work together. - Dataflow = visual transforms. Pipeline = orchestration. Notebook = code. - They **compose together**: a Pipeline can trigger a Notebook or Dataflow as a step - Mention **Mirroring** for continuous real-time replication (no scheduling needed) *Key message: pick the mover that matches your team's skillset, not the other way around.*
**On screen:** Overlap bands (colored bars) showing shared capabilities. - Walk top to bottom: UI-Driven (DF+PL), Data Shaping (DF+NB), Code-Driven (PL+NB) - **All three** can copy data from external sources -- that's the center band - **Pipeline only** has orchestration -- its unique role is coordinating the other two *Transition: "Let's see this as a feature matrix."*
**On screen:** Feature matrix grid -- checkmarks for each tool across five capabilities. - Point out the matrix grid: checkmarks make the overlaps concrete - Copies Data row: all three check -- that's the most common overlap - Orchestration row: only Pipeline -- its unique differentiator *Transition: "The overlaps are by design."*
**On screen:** Summary callout explaining the overlap philosophy. - Microsoft **intentionally** built shared capabilities so teams aren't locked in - Pipeline stands alone for orchestration because that IS its purpose *Key message: Fabric gives you choice based on skillset, not artificial constraints.*
**On screen:** Shortcut flow diagram -- 3 sources (ADLS Gen2, AWS S3, Other Workspaces) with animated dashed arrows to OneLake. - Dashed lines = **pointers, not physical copies**. Data stays at the source. - Cross-cloud: even **AWS S3** can be shortcutted into OneLake - "Other Workspaces" covers Fabric items, Dataverse, Google Cloud Storage *Transition: "But when should you shortcut vs. copy?"*
**On screen:** Shortcut comparison panel -- the "pointer" approach. - **No data movement**: data stays where it is, OneLake just references it - **Always current**: no stale copies to worry about - **No storage cost**: no duplication in OneLake - Tradeoff: **read-only** access -- you can't write back through a shortcut
**On screen:** Copy comparison panel appears alongside the Shortcut panel. - **Full transfer**: data physically moves into OneLake - **Freshness depends on schedule**: only as fresh as the last pipeline run - **Storage duplicated**: additional OneLake capacity cost - Upside: **full read + write** -- you own the data lifecycle - Decision rule: shortcut when the source is well-managed; copy when you need to transform or own the lifecycle
**On screen:** Summary callout. - Shortcuts are the **alternative to data movers** -- reference data in place - Ideal when the source is already well-governed and you just need to read *Key message: not every piece of data needs to be physically copied into OneLake.*
**On screen:** Traditional Lake comparison panel (red) -- four problem rows. - Start with the **problem**: traditional data lakes have no transaction guarantees - **Partial writes**: a pipeline failure leaves behind corrupted data - **Unstable reads**: queries during writes return mixed old/new rows - **No rollback, manual versioning**: mistakes are expensive to fix *Transition: "Delta Lake changes all of this."*
**On screen:** Delta Lake comparison panel (green) appears next to Traditional Lake. - Every row flips from red to green: Prevented, Guaranteed, Built-in, Built-in - Emphasize the italic line below: *"you no longer have to choose between fresh data and trustworthy data"* - Delta Lake adds a **transaction log** on top of Parquet files -- that's the magic *Transition: "Let's unpack what ACID actually means."*
**On screen:** Atomicity card (purple "A"). - **All or nothing** -- a write completes fully or rolls back entirely - No more partial files from failed pipeline runs
**On screen:** Consistency card (purple "C"). - Data always moves from one **valid state** to another - Schema enforcement prevents bad data from sneaking in
**On screen:** Isolation card (purple "I"). - Concurrent readers and writers see **consistent snapshots** - An analyst querying during an ETL run gets stable results -- no interference
**On screen:** Durability card (purple "D"). - Once committed, data **survives crashes**, power loss, hardware failure - The transaction log is the guarantee
**On screen:** Summary callout -- warehouse-grade reliability for the Lakehouse. - Delta Lake brings the same ACID guarantees that traditional warehouses have always had - No more choosing between fresh data and trustworthy data -- you get both *Key message: Delta Lake is what makes the Lakehouse possible. Without ACID, it's just a data swamp.*
**On screen:** Three-row flow diagram -- Dataflow (teal), Pipeline (blue), Notebook (orange), each flowing to destinations. - **Dataflow** transforms then lands into **Delta tables** (its natural target) - **Pipeline** copies raw data to **Files or Tables** -- no built-in transform step - **Notebook** can transform AND write to **either** destination (most flexible) - Note the "Transform" gear icon on Dataflow and Notebook rows; Pipeline has none *Transition: "Let's compare those two landing zones."*
**On screen:** Two-card comparison -- Delta Tables vs Files. - **Delta Tables**: ACID-compliant, queryable by SQL + Spark, best for **analytics** - **Files**: no ACID, limited queryability, best for **staging** raw/unstructured content - Rule of thumb: land raw in Files, promote to Delta Tables once cleaned *Key message: Delta Tables are the analytics format; Files are the staging zone.*
**On screen:** Data Lake frame (teal) -- Any File Format, Schema on Read, Cost Effective, Weak Governance. - Start with the **lake**: flexible, cheap, stores anything - Three strengths (any format, schema on read, low cost) and one weakness: **weak governance** - Traditional lakes lack the structure needed for reliable analytics
**On screen:** Data Warehouse frame (blue) appears alongside the Data Lake. - Now the **warehouse**: structured, governed, fast -- but rigid and expensive - Three strengths (structured tables, schema on write, fast queries) and one weakness: **rigid and costly** - For decades, orgs chose one or the other -- or maintained both at great expense *Transition: "The Lakehouse combines the best of both."*
**On screen:** Converging arrows merge into a gold "Data Lakehouse" frame with combined attributes. - Point out the converging arrows -- this is the **architecture evolution** - The Lakehouse frame shows: Any Format + ACID + Fast Queries + Low Cost - All of the strengths, neither of the weaknesses - This isn't marketing -- it's what Delta Lake + OneLake actually enable *Transition: "How does this actually work in practice?"*
**On screen:** Four numbered process cards + info callout. - **1. Delta Lake Storage**: tables ARE files -- Parquet in OneLake - **2. Auto SQL Endpoint**: every Lakehouse auto-generates a T-SQL endpoint, zero config - **3. Dual Engines**: Spark + T-SQL on the same data, pick your tool - **4. Zero Data Copies**: one storage layer, multiple engines, no ETL between them *Key message: "One storage format, multiple access patterns." That's the Lakehouse promise.*
**On screen:** Lakehouse comparison panel (teal) -- Write Engine, Read Engine, Storage, Schema, Languages. - **Lakehouse**: Spark writes, Spark + auto SQL reads, Files + Tables, schema on read, code-first - Languages: Python, PySpark, Scala - The "Code-Driven" badge is the key signal -- this is for data engineering teams
**On screen:** Warehouse comparison panel (blue) appears alongside -- same rows, different values. - **Warehouse**: T-SQL writes, T-SQL reads, Tables only, schema on write, SQL-first - Languages: stored procs, views, DDL - Same foundation (OneLake + Delta/Parquet), different interfaces *Transition: "So how do you choose?"*
**On screen:** OneLake Storage bar + "Choose Lakehouse When" / "Choose Warehouse When" cards + summary callout. - Point out the **OneLake Storage** fieldset: both items share the same underlying storage - **Lakehouse**: code-first workflows, raw files + tables, ML/data science - **Warehouse**: SQL-first teams, strict governance, traditional BI - **Critical**: either works for Power BI! Both support Direct Lake. - Every Lakehouse auto-generates a SQL Analytics Endpoint, so SQL queries work in both *Key message: pick based on team comfort, not capability limitations.*
**On screen:** Three-node flow diagram -- OneLake --> Semantic Model --> Power BI Report, plus Direct Lake callout. - **OneLake**: where data lives (Delta/Parquet in unified storage) - **Semantic Model**: the analytical brain (VertiPaq, DAX measures, relationships) - **Power BI Report**: what end users see and interact with - **Direct Lake** reads Delta tables from OneLake directly into VertiPaq -- no import copies, no DirectQuery latency - This is the **payoff** of the entire Delta/OneLake architecture - Point to the link: *"We have a dedicated Direct Lake guide for the deep dive."* *Closing: "That's the full picture -- engines, movers, storage, and the path to reporting."*
1 / 10