Skip to content

Building Medallion pipelines with ELTMaestro

How to design Bronze โ†’ Silver โ†’ Gold ("medallion") data pipelines in ELTMaestro, mapped to the actual step types you place on the Job editor canvas. Part of the User Guide.

What "medallion" means here

The medallion pattern organizes a data platform into three quality tiers:

Layer Purpose Grain
๐Ÿฅ‰ Bronze Raw landing โ€” capture source data as-is, append-only Source-shaped, untransformed
๐Ÿฅˆ Silver Cleaned, conformed, deduplicated, validated Conformed/entity-level
๐Ÿฅ‡ Gold Business marts & serving aggregates Business-grain, read-ready

ELTMaestro implements medallion by convention, not as an enforced construct. The engine has no built-in notion of layers โ€” you impose the discipline through job naming / tags, schema-per-layer naming, and JobStep chaining. Each layer is one (or more) workflow; the tiers are how you organize and promote between them.

Where the layers live โ€” your target data warehouse

All three layers โ€” Bronze, Silver, and Gold โ€” live inside the chosen target ("vendor") data-warehouse platform: Redshift, Snowflake, ClickHouse, SparkSQL (HDFS/Hive), Synapse, Databricks, Greenplum, and the rest. ELTMaestro does not host the medallion itself โ€” it builds and orchestrates it on the warehouse, typically as a schema or database per layer (bronze_* / silver_* / gold_*) all within that one platform. You pick the platform per workflow via its Job Type + platform connection (Creating a workflow).

ELTMaestro's job is the movement and the logic on top of it:

  1. Land (Bronze / ingest). Extraction steps pull from heterogeneous sources and land the raw data into an ingest / landing area on the warehouse, as-is and append-only:
  2. Jdbcsource โ€” extract from any JDBC source database;
  3. Onstagegroup โ€” stage files and register them as external tables in the warehouse catalog (Spark/Hive);
  4. General-connection extracts โ€” Datasource (Corelli streaming) and the connector extracts (Mongo, Salesforce, SFTP, local/remote files) that reach a source through a General / Cloud connection. For MPP targets the landing typically stages through object storage ($OBJECT_STORAGE) and loads into the warehouse.
  5. Build up (Silver โ†’ Gold). From that landing area you run the data-quality gates and the upper layers entirely inside the warehouse โ€” conform/clean/validate into Silver, aggregate into Gold โ€” using ELTMaestro's transform + load steps, which push the SQL/Spark work down to the platform.

So the shape is always: extract โ†’ land on the warehouse โ†’ build DQ + Silver + Gold there. Per-platform wiring is in Redshift / SparkSQL / ClickHouse / MPP setup.

Layer โ†’ step mapping

Every item below is a real step type from the step palette.

๐Ÿฅ‰ Bronze โ€” raw landing / ingestion

Need Steps
Pull / extract Jdbcsource, Datasource (Corelli streaming source), Local_file, Sftp, Pullfile; connector extracts (Mongo / Salesforce)
Land into the warehouse Onstagegroup (stages files โ†’ external tables in the Spark/Hive catalog), FileStage / File_Stage, Jdbctargethdfs (object-storage staging via $OBJECT_STORAGE); for MPP targets the staged data is then loaded into the warehouse's landing area
Arrival triggers FileWatch, FileScanner, RemoteFileScanner, JdbcWatch
Incremental capture Watermark + the isCDC / $QUERY_CDC_* delta primitive

Principle: capture faithfully, transform minimally, append-only. Use a Watermark so each run lands only new data; write to insert-only targets so Bronze stays immutable.

โ†’ Deep dive: Bronze โ€” ingestion & landing

๐Ÿฅˆ Silver โ€” cleaned, conformed, validated

Need Steps
Clean / shape Filter, Dedupe, Function / Function2 (derivations), Datamask (PII), View, Pivot
Conform / integrate Join, Union, Minus, Transformer, SparkDataCache
Historize dimensions SCD1, SCD2
Merge / upsert OnstageDelta; onstage group loaders with the isCDC / $QUERY_CDC_DELETE + $QUERY_CDC_INSERT delete-then-insert primitive
Quality gate ControlTest, SQLMetrics, MetaProfile, WriteIncident (quarantine bad rows)

Principle: Silver is idempotent and re-runnable โ€” dedupe, conform to canonical types/keys, and gate promotion on data quality (see the Silver quality gate). Column work happens in each step's expression builder.

โ†’ Deep dive: Silver โ€” clean, conform, validate

๐Ÿฅ‡ Gold โ€” business marts / serving

Need Steps
Aggregate / model Aggregate, Aggregate2, Table (GeneralSQLTable), group loaders
Publish to serving MPP JdbcTarget{Redshift,Snowflake,Synapse,Clickhouse,Greenplum,Netezza,Yellowbrick,Exasol,Databricks}, WriteJdbc, WriteFile, Export
ML feature/serving MLFeatureEngineer, MLFeatureTrainer, MLFeaturePredictor

Principle: Gold is business-grain and serving-optimized โ€” pre-aggregated facts/marts the BI and ML layers read directly.

โ†’ Deep dive: Gold โ€” aggregate & serve

The orchestration spine

What turns three layers into one end-to-end pipeline:

  • Chaining layers โ€” JobStep lets a Bronze workflow invoke Silver, which invokes Gold (one orchestrating DAG, or three workflows wired by JobStep). Switch / SetVariable / Sync handle conditional flow and barriers.
  • Scheduling per layer โ€” Administration โ–ธ Scheduler (cron): e.g. Bronze hourly, Silver/Gold on completion. See Menus.
  • Alerting โ€” Email Alerts on success/failure per layer.
  • Layer labeling โ€” Job Metadata tags (bronze / silver / gold) so the Workflow(s) list organizes by tier.

The Silver quality gate

The Bronze โ†’ Silver promotion is where you enforce quality โ€” the medallion pattern's core discipline:

  • ControlTest โ€” assert row counts, key uniqueness, referential rules before promoting.
  • WriteIncident โ€” route failing rows to a quarantine/incident table instead of silently dropping or poisoning Silver.
  • SQLMetrics / MetaProfile โ€” capture column profiles and run metrics for observability.
  • Watermark โ€” drive incremental Silver builds so only new/changed Bronze rows are reprocessed.

Governance & observability

  • Lineage โ€” the OpenLineage emitter โ†’ Marquez gives column-level lineage spanning all three layers automatically; open it from the Job editor's View Lineage button. This is medallion's strongest built-in story: Bronze โ†’ Silver โ†’ Gold provenance for free. See LINEAGE.md.
  • Metrics โ€” per-step SQLMetrics, surfaced through Runtime & Logs โ–ธ Logging โ–ธ Workflow Logs.

Design principles (the conventions to enforce)

  1. Append-only Bronze. Insert-only targets + watermarks; never mutate raw.
  2. One layer per job, chained by JobStep; tag each job by layer.
  3. Gate Silver on quality (ControlTest + WriteIncident quarantine) before promoting.
  4. Idempotent, re-runnable steps โ€” watermarks + upsert (OnstageDelta / SCD) so re-runs converge.
  5. Schema-per-layer naming (e.g. bronze_* / silver_* / gold_* schemas or databases) to keep tiers separable.
  6. Honest gap: ELTMaestro enforces none of this โ€” there is no first-class "medallion" object. It's design discipline, and these conventions are how you keep the tiers clean.

A worked example โ€” HR headcount mart

A small end-to-end medallion pipeline, built as three workflows chained by JobStep:

1. Bronze โ€” land raw HR data (bronze_hr, tagged bronze) - Jdbcsource (or Datasource) pulls Employee and EmployeeDepartmentHistory from the source system. - Watermark scopes each run to new/changed rows; targets are insert-only (Jdbctargethdfs / Onstagegroup), so Bronze stays append-only.

2. Silver โ€” conform & validate (silver_hr, tagged silver) - Join the two Bronze tables into a conformed employee_department_joined; Dedupe removes duplicates; a Function2 step derives/cleans columns; SCD2 historizes the department dimension. - A ControlTest asserts key uniqueness and row counts; WriteIncident quarantines failing rows rather than letting them into Silver. - This is exactly the shape of the join โ†’ dedupe โ†’ function transform DAG shown in Designing the DAG:

Silver-layer transform DAG โ€” Employee + EmployeeDepartmentHistory joined into employee_department_joined, passed through a FUNCTION2 step, with a parallel dedupe โ†’ filter branch

3. Gold โ€” build the mart (gold_hr, tagged gold) - Aggregate2 rolls Silver up to headcount-by-department; Table (or a group loader) writes the gold_headcount mart; JdbcTargetClickhouse (or your serving MPP) publishes it for BI.

Spine: a master workflow uses JobStep to run bronze_hr โ†’ silver_hr โ†’ gold_hr in order; the Scheduler runs Bronze hourly and triggers the rest on completion; Email Alerts notify on failure. The whole chain's column lineage is visible end-to-end in the Lineage Viewer.

Where to build a runnable reference

A local ClickHouse + Spark/HDFS + Postgres stack supports an end-to-end medallion demo: files โ†’ Bronze (lake external tables) โ†’ Silver (conform in ClickHouse/Spark, gated) โ†’ Gold (Aggregate2 + JdbcTargetClickhouse), chained by JobStep, scheduled + alerted, with the whole chain visible in the Lineage Viewer.