~15 min read

Medallion Architecture

Organizing Data for Analytics in Microsoft Fabric

From raw ingestion to curated reporting -- how Bronze, Silver, and Gold layers structure your data

From Raw to Refined
Scroll to explore

The Layers

Data Engineering -- get data ready for reporting

Bronze Raw Data Source format preserved Append-only history
Clean
Silver Clean Data Validated, de-duplicated Conformed types
Model
Gold Curated Data Star schema modeled Optimized for reporting

Where Each Layer Gets Used

Different layers serve different audiences and purposes

Bronze Layer

Personas

Data Engineers
Platform Admins
DevOps Engineers

Use Cases

Audit & Compliance
Data Lineage
Source Recovery
Silver Layer

Personas

Data Scientists
Data Analysts
ML Engineers

Use Cases

Feature Engineering
Exploratory Analysis
Data Quality Validation
Gold Layer

Personas

Report Consumers
Self-service Analysts
Business Leaders

Use Cases

Power BI Dashboards
AI / Copilot
Excel & Paginated

The Three Pillars

How the medallion layers map to the broader architecture

Data Engineering

The medallion process: Bronze → Silver → Gold

Bronze — ingest raw
Silver — clean & validate
Gold — model & curate
This is the process, not a product. Any tool that moves data through these stages is doing data engineering.

Semantic Layer

Connection points to the data

Semantic Model
SQL Endpoint
KQL Queryset
These are connection inputs -- the bridge between data engineering output and business consumption.

Reporting and BI

Consumption devices for end users

Power BI
Excel
Paginated Reports
AI / Copilot
These tools read from the semantic layer -- they don't touch Bronze/Silver/Gold directly.

Under the Hood

File formats, storage characteristics, and the refining analogy

Bronze Iron Ore
FormatSource
SchemaSource system
WriteAppend-only
HistoryFull
Silver Forged Metal
FormatDelta
SchemaConformed
WriteVersioned
QualityACID
Gold Refined Product
FormatDelta
SchemaStar schema
ContentFacts + dims
KPIsPre-computed

Extended Patterns

Some organizations add more granularity to the medallion layers

The core medallion is Bronze, Silver, and Gold. Some organizations add layers before and after for more granular control.

Landing

Staging

Unchanged copy from source. Same format, same schema.

Raw

Iron Ore

Source format preserved. Original schema intact.

Standardized

Forged Metal

Delta tables. Validated, de-duplicated, ACID-guaranteed.

Curated

Refined Product

Star schema. Business logic enforced.

Semantic

Platinum

TMDL metadata. Measures and logic defined.

Core Layer
Extension Layer
i
  • The Semantic layer can either exist as its own step, or its logic can fold into the Gold layer.
  • Bronze, Silver, and Gold are conventions, not standards, and organizations often use their own names.

Example Data Flow

The full picture -- from source systems to business consumption

Source

Data source systems

ERP
CRM
Files

Data Engineering

Get data ready for reporting

Ingest
Layer
Data Factory
Transform
Layer
Prepare Data
Cleanse + Validate
Curate Data
Warehouse Build
Store
Layer
Bronze
Raw Data
Silver
Clean Data
Gold
Curated Data

Semantic Layer

Access the data

Semantic
Model
SQL
Endpoint

Reporting and BI

Business and end user consumption

Data
Science
Notebook
Notebook
Insights
AI / Copilot
Reporting
Reports
Excel
Paginated

Governance & Lifecycle

Secure and control data across all layers

Plan
Implement
Monitor
Govern
Secure
Deploy
**On screen:** Bronze layer node — Raw Data, source format preserved, append-only history - This is the landing zone for everything coming into the platform - Data keeps its original shape — CSV, JSON, database tables, API payloads - Append-only means we never delete or overwrite; full history is preserved - Think of it as the "safety net" — if anything goes wrong downstream, you can always come back here *Transition:* Arrow labeled "Clean" points to Silver — that's the next transformation step
**On screen:** Silver layer node — Clean Data, validated, de-duplicated, conformed types - This is where the heavy lifting happens: cleaning, deduplication, type conformance - Business rules start being applied — nulls handled, dates standardized, keys aligned - The data is now trustworthy enough for analysts and data scientists to query directly - Still not shaped for reporting — that's Gold's job *Transition:* Arrow labeled "Model" points to Gold — the final curation step
**On screen:** Gold layer node — Curated Data, star schema modeled, optimized for reporting - Star schema: fact tables surrounded by dimension tables - Pre-computed aggregations and KPIs live here - This is the layer Power BI, Excel, and other reporting tools should connect to - Optimized for read performance — queries are fast because the modeling work is already done *Key message:* Data flows Bronze to Silver to Gold through transformation steps between each layer
**On screen:** Bronze Layer frame appears — fieldset with orange border - We're now looking at who actually uses each layer - Bronze isn't just a dumping ground — specific roles depend on it - The frame groups personas on the left and use cases on the right *Transition:* Personas appear next
**On screen:** Bronze Personas — Data Engineers, Platform Admins, DevOps Engineers - Data Engineers build and maintain the ingestion pipelines that feed Bronze - Platform Admins manage the infrastructure — storage accounts, access policies - DevOps Engineers handle CI/CD for the data pipelines themselves - These are all technical roles — Bronze is engineering territory
**On screen:** Bronze Use Cases — Audit & Compliance, Data Lineage, Source Recovery - Audit teams need the unaltered source record for regulatory compliance - Data Lineage traces where every value originated — Bronze is the starting point - Source Recovery: if a downstream transform goes wrong, Bronze is your rollback *Transition:* Moving to the Silver layer and its audience
**On screen:** Silver Layer frame appears — fieldset with silver/gray border - Silver serves a different audience: people who need clean, queryable data - Not yet modeled for reporting, but trustworthy enough for exploration and ML *Transition:* Personas appear next
**On screen:** Silver Personas — Data Scientists, Data Analysts, ML Engineers - Data Scientists explore Silver for feature engineering and model training - Data Analysts run ad-hoc queries — Silver is clean enough for reliable answers - ML Engineers build training pipelines that pull directly from Silver tables
**On screen:** Silver Use Cases — Feature Engineering, Exploratory Analysis, Data Quality Validation - Feature Engineering: building ML features from cleaned, conformed data - Exploratory Analysis: ad-hoc querying to find patterns before formal reporting - Data Quality Validation: Silver is where you verify cleaning rules actually worked *Transition:* Moving to the Gold layer — the business consumption tier
**On screen:** Gold Layer frame appears — fieldset with gold border - Gold is the business-facing layer — everything here is optimized for consumption - This is where the non-technical audience enters the picture *Transition:* Personas appear next
**On screen:** Gold Personas — Report Consumers, Self-service Analysts, Business Leaders - Report Consumers view dashboards and subscribe to scheduled reports - Self-service Analysts build their own Power BI reports and Excel analyses - Business Leaders make decisions based on KPIs and executive dashboards
**On screen:** Gold Use Cases — Power BI Dashboards, AI / Copilot, Excel & Paginated - Power BI Dashboards: interactive visuals connected to the semantic model - AI / Copilot: natural language queries against curated, well-modeled data - Excel & Paginated: traditional reporting formats, pulling from the same Gold layer *Key message:* The layer determines the audience — raw for engineering, clean for analysis, curated for business
**On screen:** Data Engineering pillar card — Bronze (ingest raw), Silver (clean & validate), Gold (model & curate) - Data Engineering is the *process*, not a product - Any tool that moves data through Bronze, Silver, and Gold stages is doing data engineering - Callout reinforces: "This is the process, not a product" - Fabric uses Dataflows, Notebooks, and Pipelines — but the concept is tool-agnostic *Transition:* Next pillar shows what sits between engineering and consumption
**On screen:** Semantic Layer pillar card — Semantic Model, SQL Endpoint, KQL Queryset - These are the *products* — the connection points between data and consumers - Semantic Model is the primary path: measures, relationships, DAX logic - SQL Endpoint provides T-SQL access for notebooks and ad-hoc queries - KQL Queryset enables real-time analytics scenarios - Callout: "These are connection inputs — the bridge between data engineering output and business consumption" *Transition:* Final pillar shows who actually consumes the data
**On screen:** Reporting and BI pillar card — Power BI, Excel, Paginated Reports, AI / Copilot - These are the *consumers* — the tools end users actually interact with - Power BI for interactive dashboards, Excel for spreadsheet analysis - Paginated Reports for pixel-perfect, printable output - AI / Copilot for natural language querying - Callout: "These tools read from the semantic layer — they don't touch Bronze/Silver/Gold directly" *Key message:* Three pillars: the process (engineering), the products (semantic), the consumers (reporting)
**On screen:** Bronze feature card — "Iron Ore" subtitle; Format: Source, Schema: Source system, Write: Append-only, History: Full - The metal refining analogy: Bronze is like iron ore — raw, unprocessed - Format preserves whatever the source system sent — CSV, JSON, database dumps - Schema matches the source system exactly — no transformation yet - Append-only write pattern means nothing is ever overwritten or deleted - Full history is the key value: complete audit trail from day one *Transition:* Silver card shows what happens after the first refinement pass
**On screen:** Silver feature card — "Forged Metal" subtitle; Format: Delta, Schema: Conformed, Write: Versioned, Quality: ACID - Forged Metal analogy: raw material has been refined into something useful - Delta format = Parquet files + transaction log — enables versioning and time travel - Schema is now conformed: consistent data types, naming conventions, key alignment - Versioned writes mean you can roll back to any previous state - ACID guarantees: Atomicity, Consistency, Isolation, Durability *Transition:* Gold card reveals the final refinement stage
**On screen:** Gold feature card — "Refined Product" subtitle; Format: Delta, Schema: Star schema, Content: Facts + dims, KPIs: Pre-computed - Refined Product analogy: the finished goods ready for the consumer - Still Delta format, but with enforced star schema on top - Star schema: fact tables (transactions, events) surrounded by dimension tables (products, dates, customers) - Pre-computed KPIs: aggregations calculated in advance so reports load fast *Key message:* Each layer maps to a real file format progression — from raw source files through Delta tables to modeled star schemas
**On screen:** Core medallion layers appear — Raw (Bronze), Standardized (Silver), Curated (Gold) with arrows between them - The intro text sets context: core medallion is Bronze, Silver, Gold - These three cards show the standard pattern that every organization uses - Raw = source format preserved, Standardized = Delta with validation, Curated = star schema with business logic - Arrows show the left-to-right transformation flow *Transition:* Extension layers will appear on either side
**On screen:** Landing card appears to the left of the core layers — "Staging" subtitle, dashed extension border - Landing is a pre-Bronze extension: an exact, unchanged copy from the source system - Same format, same schema — no transformation at all - Purpose: decouple ingestion from processing so source systems aren't queried repeatedly - The dashed border and legend distinguish extensions from core layers *Transition:* Semantic extension appears on the right side
**On screen:** Semantic card appears to the right of Gold — "Platinum" subtitle, dashed extension border - Semantic / Platinum is a post-Gold extension: TMDL metadata layered on top of curated data - Measures, relationships, hierarchies, and business-friendly names defined here - This is where the semantic model lives — the bridge to reporting tools - Some organizations fold this into Gold; others keep it separate for clarity *Transition:* Info box summarizes the pattern
**On screen:** Info box with two bullet points about Semantic layer flexibility and naming conventions - First point: the Semantic layer can be its own step or fold into Gold — it depends on org preference - Second point: Bronze, Silver, Gold are conventions, not standards — organizations often rename them - This is a good moment to ask the audience what their org calls these layers *Key message:* Core three are the standard medallion; Landing and Semantic are common extensions that add granularity
**On screen:** Sources zone — ERP, CRM, and Files icons in the leftmost column - This is the full end-to-end architecture diagram — everything we've discussed comes together here - ERP systems (SAP, Dynamics), CRM platforms (Salesforce, HubSpot), flat files (CSV, Excel, JSON) - These are the origin points — data flows from left to right across the diagram - Every organization has multiple source systems; the architecture handles them all the same way *Transition:* Data Engineering zone shows how source data gets processed
**On screen:** Data Engineering zone — three sub-layers: Ingest (Data Factory), Transform (Prepare + Curate), Store (Bronze, Silver, Gold) - Ingest Layer: Data Factory pulls from source systems on a schedule or trigger - Transform Layer has two stages: Prepare (cleanse + validate) then Curate (warehouse build) - Store Layer maps directly to the medallion: Bronze (raw), Silver (clean), Gold (curated) - This is the most complex zone — it's where all the data engineering work happens - Flow overlay arrows will connect these nodes to downstream consumers *Transition:* Semantic Layer shows the access points
**On screen:** Semantic Layer zone — Semantic Model (primary, filled) and SQL Endpoint (bordered) - Two access paths to the data, each serving different consumers - Semantic Model is the primary path: DAX measures, relationships, hierarchies for Power BI - SQL Endpoint provides T-SQL access for notebooks, ad-hoc queries, and cross-platform tools - The visual weight (filled vs bordered) signals which path is primary *Transition:* Reporting zone shows the consumer groups
**On screen:** Reporting and BI zone — three sub-layers: Data Science (Semantic Link + Notebooks), Insights (AI / Copilot), Reporting (Reports, Excel, Paginated) - Data Science: Semantic Link connects notebooks to the semantic model; multiple notebooks shown for parallel workloads - Insights: AI / Copilot provides natural language querying against the curated data - Reporting: Power BI reports, Excel workbooks, and Paginated reports for traditional BI - All three sub-layers read from the Semantic Layer — none bypass it to hit raw tables *Transition:* Governance wraps around the entire pipeline
**On screen:** Governance & Lifecycle bar — Plan, Implement, Monitor, Govern, Secure, Deploy - Governance isn't a separate step — it wraps the entire pipeline from end to end - Plan: define data policies, ownership, and quality standards before building - Implement: build the pipelines, models, and reports following those standards - Monitor: track data freshness, pipeline failures, and usage patterns - Govern: enforce access controls, data classification, and retention policies - Secure: encrypt at rest and in transit, manage identity and credentials - Deploy: CI/CD for data assets — version-controlled promotions across environments *Key message:* The full picture ties together sources, engineering, semantic, reporting, and governance into one coherent architecture
1 / 7