~14 min read

Fabric Engines & Items

Understanding the Building Blocks of Microsoft Fabric

From compute engines to data movers and storage -- the pieces that power your analytics

The Building Blocks of Fabric

Scroll to explore

Compute Engines

Four engines that power Microsoft Fabric

Fabric provides different compute engines for different workloads. Each engine is optimized for a specific type of processing, but they all read from the same unified storage layer -- OneLake.

Spark handles large-scale batch and streaming. T-SQL covers relational analytics. KQL targets time-series and log data. Analysis Services powers semantic modeling for Power BI.

Compute Engines

Spark

T-SQL

KQL

Analysis Services

Storage

OneLake

All engines query the same data in OneLake -- no copying needed.

Engine Profiles

Each engine is optimized for different workloads -- and every Fabric item runs on one of these under the hood.

Spark

Languages PySpark, Scala, R

Workflow Notebook-first

Best For Big data + ML

Powers

Lakehouse Notebook

T-SQL

Languages T-SQL

Workflow Schema-first

Best For BI + Analytics

Powers

Warehouse SQL Endpoint

KQL

Languages KQL

Workflow Real-time

Best For Streaming + Logs

Powers

Eventhouse

Analysis Services

Languages DAX

Workflow In-memory

Best For Semantic layer

Powers

Semantic Model

Data Movers

Three tools for moving and transforming data

Dataflow Gen2

Low-Code

Visual ETL using Power Query (M language). Drag-and-drop transforms -- no code required.

Interface Drag & Drop

Transforms Merge, Filter, Pivot

Best For Simple ETL

Pipeline

Orchestration

Workflow engine for multi-step data movement. Schedules, coordinates, and monitors jobs.

Interface Visual Canvas

Actions Copy, Branch, Loop

Best For Multi-Step Workflows

Notebook

Code-Driven

Code-first data engineering with full programmatic control. Supports Python, PySpark, Scala, and SparkSQL in interactive notebooks.

Interface Code Cells

Languages Python, Spark, SQL

Best For Complex Transforms

Each tool handles a different slice of the data movement problem.
Dataflow for visual transforms. Pipeline for orchestration. Notebook for code.

They're designed to work together -- a Pipeline can trigger a Notebook or Dataflow as a step in a larger workflow.
There's also Mirroring for continuous, real-time replication from external databases.

How They Overlap

Shared capabilities across data movers

Dataflow

Pipeline

Notebook

UI-Driven

Dataflow + Pipeline

Data Shaping

Dataflow + Notebook

Code-Driven

Pipeline + Notebook

Copies Data

All Three

Orchestration

Pipeline Only

Dataflow

Pipeline

Notebook

UI-Driven

Code-Driven

Data Shaping

Copies Data

Orchestration

The overlaps are intentional -- Microsoft designed these tools to share capabilities so teams can pick the mover that fits their skill set without losing functionality. Pipeline stands alone with orchestration because that's its primary job: coordinating the other two.

Shortcuts

Reference external data without copying it

ADLS Gen2

Azure Data Lake Storage

AWS S3

Amazon S3 Buckets

Other Workspaces

Fabric Items + More

OneLake

Shortcut

Zero-copy reference

Shortcut

Data Movement None

Freshness Always Current

Storage Cost None

Access Read-Only

Pointer

Copy

Data Movement Full Transfer

Freshness As of Last Run

Storage Cost Duplicated

Access Read + Write

Full Control

Shortcuts are ideal when data already lives in a well-managed source. Copy when you need to transform, enrich, or own the data lifecycle.

ACID

Why traditional data lakes break -- and how Delta fixes it

Traditional Lake

Partial Writes Possible

Read Consistency Unstable

Rollback None

Versioning Manual

Pre-Delta

Delta Lake

Partial Writes Prevented

Read Consistency Guaranteed

Rollback Built-in

Versioning Built-in

ACID Compliant

Delta Lake means you no longer have to choose between fresh data and trustworthy data.

Atomicity

All or nothing -- transactions complete fully or not at all

Consistency

Data always moves from one valid state to another

Isolation

Concurrent operations don't interfere with each other

Durability

Once committed, data persists even through failures

Delta Lake brings warehouse-grade reliability to the Lakehouse.
No more choosing between fresh data and trustworthy data -- ACID compliance means you get both.

Copy Data Activity

Where your data lands depends on the tool

Dataflow

Transform

Tables (Delta)

Pipeline

Files

Tables (Delta)

Notebook

Transform

Files

Tables (Delta)

Dataflows load transformed data into Delta tables.
Pipelines copy raw data as files or tables.
Notebooks can transform and write to either destination.

Delta Tables

ACID Yes

Queryable SQL + Spark

Best For Analytics

Files

ACID No

Queryable Limited

Best For Staging

Lakehouse

The best of both worlds

Data Lake

Any File Format

Schema on Read

Cost Effective

Weak Governance

Data Warehouse

Structured Tables

Schema on Write

Fast Queries

Rigid & Costly

Data Lakehouse

Any Format

ACID

Fast Queries

Low Cost

How the Lakehouse Works

Delta Lake Storage

Your tables are files -- open-format Parquet in OneLake, readable by any engine.

Auto SQL Endpoint

Run T-SQL queries against Delta tables -- auto-generated for every Lakehouse.

Dual Engines

Spark + T-SQL both reading the same underlying data. Pick your tool.

Zero Data Copies

One storage layer, multiple engines. No ETL between lake and warehouse.

One storage format, multiple access patterns. That's the Lakehouse promise.

Lakehouse vs Warehouse

Same storage, different write patterns

Lakehouse

Write Engine Spark

Read Engine Spark + Auto SQL

Storage Files + Tables

Schema On Read (Flexible)

Languages Python, PySpark, Scala

Code-Driven

Warehouse

Write Engine T-SQL

Read Engine T-SQL

Storage Tables Only

Schema On Write (Strict)

Languages Stored Procs, Views, DDL

SQL-Driven

OneLake Storage

Delta / Parquet -- one copy, multiple engines

Choose Lakehouse When

Code-first workflows

Raw files + structured tables

ML, data science, experimentation

Choose Warehouse When

SQL-first teams with T-SQL

Strict governance + auditing

Traditional BI + dashboards

Either works for Power BI. Both support Direct Lake. And every Lakehouse auto-generates a SQL Analytics Endpoint, so SQL queries work in both -- pick whichever matches your team.

From Storage to Report

Completing the data journey

The Data Journey

OneLake

Semantic Model

Power BI Report

In Fabric, Direct Lake mode connects OneLake to Power BI without copies or compromises. It's the payoff of the entire Delta / OneLake architecture. For a deep dive, see our Direct Lake guide.

**On screen:** Intro text naming the four engines + stacked architecture diagram (Compute Engines frame above OneLake Storage frame). - Set the frame: no single engine can handle batch, streaming, relational, AND in-memory BI - Fabric gives you **one engine per workload**, all sharing OneLake storage - Point out the diagram: **Compute Engines** frame on top, **Storage** frame below - Emphasize: all four engines read from the **same OneLake data** -- no ETL between them *Transition: "Let's look at each engine's profile."*

**On screen:** Spark profile card (orange) -- Languages, Workflow, Best For, Powers badges. - **Apache Spark**: large-scale batch processing and streaming - Languages: Python, PySpark, Scala, SparkSQL, R - Powers the **Lakehouse** and **Notebook** items - Best for: big data transforms, ML pipelines, ad-hoc data exploration *Transition: "Next, the relational engine."*

**On screen:** T-SQL profile card (teal) -- the relational analytics engine. - Familiar **SQL Server** syntax: SELECT, JOIN, stored procs, views - Powers the **Warehouse** and **SQL Analytics Endpoint** - Best for: structured queries, governed analytics, teams with SQL experience *Transition: "Now the real-time engine."*

**On screen:** KQL profile card (blue) -- time-series and log analytics. - **Kusto Query Language**: purpose-built for fast queries over streaming and time-series data - Powers the **Eventhouse** (formerly KQL Database) - Best for: IoT telemetry, application logs, real-time dashboards *Transition: "And the last engine -- the one closest to Power BI."*

**On screen:** Analysis Services profile card (purple) -- the semantic modeling engine. - The **VertiPaq** in-memory engine that powers every Power BI semantic model - DAX measures, relationships, calculation groups -- all run here - Powers the **Semantic Model** and **Power BI Report** - *Audience check: "Which of these engines does your team use most today?"* *Key message: every Fabric item runs on one of these four engines under the hood.*

**On screen:** Dataflow Gen2 card -- teal, "Low-Code" badge. - **Power Query Online** -- same drag-and-drop as Power BI Desktop, now cloud-native - Interface: visual transforms (merge, filter, pivot). No code required. - Best for: **straightforward ETL** where a business analyst can self-serve *Transition: "Next, the orchestrator."*

**On screen:** Pipeline card -- blue, "Orchestration" badge. - Inherited from **Azure Data Factory** -- visual canvas for multi-step workflows - Copy Activity moves data from 90+ sources. ForEach, If-Else, Switch for branching. - Best for: **scheduling and coordinating** other tools (trigger a Notebook, chain a Dataflow) *Transition: "And for full programmatic control..."*

**On screen:** Notebook card -- orange, "Code-Driven" badge. - Interactive cells: **PySpark, Scala, SparkSQL, Python** - Full library access, ML frameworks, visualizations in-cell - Best for: **complex transforms**, ML feature engineering, ad-hoc exploration *Transition: "So when do you pick which tool?"*

**On screen:** Summary callout -- tools work together. - Dataflow = visual transforms. Pipeline = orchestration. Notebook = code. - They **compose together**: a Pipeline can trigger a Notebook or Dataflow as a step - Mention **Mirroring** for continuous real-time replication (no scheduling needed) *Key message: pick the mover that matches your team's skillset, not the other way around.*

**On screen:** Overlap bands (colored bars) showing shared capabilities. - Walk top to bottom: UI-Driven (DF+PL), Data Shaping (DF+NB), Code-Driven (PL+NB) - **All three** can copy data from external sources -- that's the center band - **Pipeline only** has orchestration -- its unique role is coordinating the other two *Transition: "Let's see this as a feature matrix."*

**On screen:** Feature matrix grid -- checkmarks for each tool across five capabilities. - Point out the matrix grid: checkmarks make the overlaps concrete - Copies Data row: all three check -- that's the most common overlap - Orchestration row: only Pipeline -- its unique differentiator *Transition: "The overlaps are by design."*

**On screen:** Summary callout explaining the overlap philosophy. - Microsoft **intentionally** built shared capabilities so teams aren't locked in - Pipeline stands alone for orchestration because that IS its purpose *Key message: Fabric gives you choice based on skillset, not artificial constraints.*

**On screen:** Shortcut flow diagram -- 3 sources (ADLS Gen2, AWS S3, Other Workspaces) with animated dashed arrows to OneLake. - Dashed lines = **pointers, not physical copies**. Data stays at the source. - Cross-cloud: even **AWS S3** can be shortcutted into OneLake - "Other Workspaces" covers Fabric items, Dataverse, Google Cloud Storage *Transition: "But when should you shortcut vs. copy?"*

**On screen:** Shortcut comparison panel -- the "pointer" approach. - **No data movement**: data stays where it is, OneLake just references it - **Always current**: no stale copies to worry about - **No storage cost**: no duplication in OneLake - Tradeoff: **read-only** access -- you can't write back through a shortcut

**On screen:** Copy comparison panel appears alongside the Shortcut panel. - **Full transfer**: data physically moves into OneLake - **Freshness depends on schedule**: only as fresh as the last pipeline run - **Storage duplicated**: additional OneLake capacity cost - Upside: **full read + write** -- you own the data lifecycle - Decision rule: shortcut when the source is well-managed; copy when you need to transform or own the lifecycle

**On screen:** Traditional Lake comparison panel (red) -- four problem rows. - Start with the **problem**: traditional data lakes have no transaction guarantees - **Partial writes**: a pipeline failure leaves behind corrupted data - **Unstable reads**: queries during writes return mixed old/new rows - **No rollback, manual versioning**: mistakes are expensive to fix *Transition: "Delta Lake changes all of this."*

**On screen:** Delta Lake comparison panel (green) appears next to Traditional Lake. - Every row flips from red to green: Prevented, Guaranteed, Built-in, Built-in - Emphasize the italic line below: *"you no longer have to choose between fresh data and trustworthy data"* - Delta Lake adds a **transaction log** on top of Parquet files -- that's the magic *Transition: "Let's unpack what ACID actually means."*

**On screen:** Summary callout -- warehouse-grade reliability for the Lakehouse. - Delta Lake brings the same ACID guarantees that traditional warehouses have always had - No more choosing between fresh data and trustworthy data -- you get both *Key message: Delta Lake is what makes the Lakehouse possible. Without ACID, it's just a data swamp.*

**On screen:** Three-row flow diagram -- Dataflow (teal), Pipeline (blue), Notebook (orange), each flowing to destinations. - **Dataflow** transforms then lands into **Delta tables** (its natural target) - **Pipeline** copies raw data to **Files or Tables** -- no built-in transform step - **Notebook** can transform AND write to **either** destination (most flexible) - Note the "Transform" gear icon on Dataflow and Notebook rows; Pipeline has none *Transition: "Let's compare those two landing zones."*

**On screen:** Two-card comparison -- Delta Tables vs Files. - **Delta Tables**: ACID-compliant, queryable by SQL + Spark, best for **analytics** - **Files**: no ACID, limited queryability, best for **staging** raw/unstructured content - Rule of thumb: land raw in Files, promote to Delta Tables once cleaned *Key message: Delta Tables are the analytics format; Files are the staging zone.*

**On screen:** Data Lake frame (teal) -- Any File Format, Schema on Read, Cost Effective, Weak Governance. - Start with the **lake**: flexible, cheap, stores anything - Three strengths (any format, schema on read, low cost) and one weakness: **weak governance** - Traditional lakes lack the structure needed for reliable analytics

**On screen:** Data Warehouse frame (blue) appears alongside the Data Lake. - Now the **warehouse**: structured, governed, fast -- but rigid and expensive - Three strengths (structured tables, schema on write, fast queries) and one weakness: **rigid and costly** - For decades, orgs chose one or the other -- or maintained both at great expense *Transition: "The Lakehouse combines the best of both."*

**On screen:** Converging arrows merge into a gold "Data Lakehouse" frame with combined attributes. - Point out the converging arrows -- this is the **architecture evolution** - The Lakehouse frame shows: Any Format + ACID + Fast Queries + Low Cost - All of the strengths, neither of the weaknesses - This isn't marketing -- it's what Delta Lake + OneLake actually enable *Transition: "How does this actually work in practice?"*

**On screen:** Four numbered process cards + info callout. - **1. Delta Lake Storage**: tables ARE files -- Parquet in OneLake - **2. Auto SQL Endpoint**: every Lakehouse auto-generates a T-SQL endpoint, zero config - **3. Dual Engines**: Spark + T-SQL on the same data, pick your tool - **4. Zero Data Copies**: one storage layer, multiple engines, no ETL between them *Key message: "One storage format, multiple access patterns." That's the Lakehouse promise.*

**On screen:** Lakehouse comparison panel (teal) -- Write Engine, Read Engine, Storage, Schema, Languages. - **Lakehouse**: Spark writes, Spark + auto SQL reads, Files + Tables, schema on read, code-first - Languages: Python, PySpark, Scala - The "Code-Driven" badge is the key signal -- this is for data engineering teams

**On screen:** Warehouse comparison panel (blue) appears alongside -- same rows, different values. - **Warehouse**: T-SQL writes, T-SQL reads, Tables only, schema on write, SQL-first - Languages: stored procs, views, DDL - Same foundation (OneLake + Delta/Parquet), different interfaces *Transition: "So how do you choose?"*

**On screen:** OneLake Storage bar + "Choose Lakehouse When" / "Choose Warehouse When" cards + summary callout. - Point out the **OneLake Storage** fieldset: both items share the same underlying storage - **Lakehouse**: code-first workflows, raw files + tables, ML/data science - **Warehouse**: SQL-first teams, strict governance, traditional BI - **Critical**: either works for Power BI! Both support Direct Lake. - Every Lakehouse auto-generates a SQL Analytics Endpoint, so SQL queries work in both *Key message: pick based on team comfort, not capability limitations.*

**On screen:** Three-node flow diagram -- OneLake --> Semantic Model --> Power BI Report, plus Direct Lake callout. - **OneLake**: where data lives (Delta/Parquet in unified storage) - **Semantic Model**: the analytical brain (VertiPaq, DAX measures, relationships) - **Power BI Report**: what end users see and interact with - **Direct Lake** reads Delta tables from OneLake directly into VertiPaq -- no import copies, no DirectQuery latency - This is the **payoff** of the entire Delta/OneLake architecture - Point to the link: *"We have a dedicated Direct Lake guide for the deep dive."* *Closing: "That's the full picture -- engines, movers, storage, and the path to reporting."*