The Unified Data Platform, Explained
A practical introduction for data professionals coming from the Microsoft ecosystem
Why organizations ended up with two systems for one job
The same data copied across warehouse, lake, and staging areas. Every copy drifts over time.
Different access controls in the warehouse vs. the lake. Who has the "right" version?
Paying for storage twice, compute twice, and ETL pipelines to keep everything in sync.
Lake flexibility with warehouse reliability
A platform layer, not a cloud replacement
Databricks orchestrates the workspace, compute, and governance layers. Your storage stays in your cloud account, while compute is decoupled so you can scale it independently.
Data lives in your own ADLS Gen2 (Azure), S3 (AWS), or GCS (Google) storage. Databricks reads and writes to it, but never takes ownership of it.
Spin up clusters when you need processing power, shut them down when you don't. Storage and compute are fully decoupled.
Storage is yours, sitting in your cloud account. Databricks runs the workspace, compute, and governance on top.
What you can actually do on the Databricks platform
Databricks organizes its capabilities into four pillars. Most organizations start with one or two, then expand as their data maturity grows.
Databricks runs on your cloud, not instead of it
| Area | Databricks Provides | Your Cloud Provides |
|---|---|---|
| Development | Workspace + notebooks | Object storage |
| Compute | Optimized Spark + Photon | Virtual machines |
| Governance | Unity Catalog | Identity provider |
| Pipelines | Lakeflow Spark Declarative Pipelines | Encryption + keys |
| Cost | Platform fee (DBUs) | Infrastructure billing |
Both/And, not Either/Or
Databricks and Microsoft Fabric are complementary platforms. Many organizations use both: Databricks for heavy engineering and ML, Fabric and Power BI for analytics and reporting. The real question is not "which one?" but "where does each fit?"
Processing + ML
OneLake + Analytics
Reporting + Dashboards
Four personas, one platform
Different roles interact with different parts of the platform. Here are the four primary personas you will encounter in a Databricks environment.
Builds and orchestrates pipelines using notebooks, Lakeflow Spark Declarative Pipelines, and Lakeflow Jobs. Python and SQL are the primary languages.
Queries data through SQL Warehouses and the SQL Editor. Builds dashboards and connects BI tools. No Python required.
Trains and deploys ML models using MLflow, Feature Store, and notebook experiments. Leverages GPU clusters for deep learning.
Manages Unity Catalog, workspace access, cluster policies, and cost controls. The person who keeps the platform secure and efficient.
This is one of the most common misconceptions about Databricks. SQL-first users can work entirely within SQL Warehouses and the SQL Editor. If you are comfortable writing T-SQL in SQL Server or Fabric, you already have the foundation to query data in Databricks.
A common Databricks architecture from source systems to business consumption
Data source systems
Ingest, transform, and store with Databricks
Query the data
Business and data science consumption
Unified data governance across all layers
Whether you are evaluating Databricks alongside Microsoft Fabric or planning a hybrid architecture, we can help you design the right approach for your organization.
Discussion