Skip to content

๐Ÿฅˆ Silver layer โ€” clean, conform, validate

Deep-dive companion to Medallion architecture with ELTMaestro. Silver turns raw Bronze into clean, conformed, deduplicated, validated data that the business can trust. Everything here should be idempotent and re-runnable.

Cleaning & shaping

Need Step Notes
Filter rows Filter Drop/keep rows by a condition expression.
Deduplicate Dedupe Remove duplicates by key โ€” define the dedupe key carefully (it decides which row wins).
Derive / clean columns Function / Function2 Per-row scalar (F(x)) and aggregate (ฮฃ(x)) expressions in the expression builder; functions come from the function catalog.
Mask PII Datamask Redact/obfuscate sensitive columns as data enters the trusted zone.
Re-expose View, Pivot Logical views / pivot reshaping.

Conforming & integrating

Need Step
Join sources Join (column mapping + join condition in the expression builder)
Combine sets Union, Minus
Generic transform Transformer, SparkDataCache (cache an intermediate for reuse)

Conform to canonical types and keys here โ€” cast the raw Bronze strings to real types, standardize codes, and resolve natural keys so every downstream join lines up.

Historization (slowly-changing dimensions)

Step Use
SCD1 Overwrite-in-place โ€” keep only the current value (no history).
SCD2 Versioned history โ€” new row per change, with effective dating.

SCD steps detect change via the platform's hash template ($HASH_TEMPLATE / $HASH_RETURNS / $HASH_EXPRESSION, e.g. NVL(MD5($ARG::TEXT),'')) โ€” only rows whose hash changed are versioned, so reruns are stable.

Upsert / merge into Silver

Step Pattern
OnstageDelta Apply a staged delta to the Silver table.
Onstage group loaders + isCDC Parallel delete-then-insert merge via $QUERY_CDC_DELETE + $QUERY_CDC_INSERT (the engine applies it through its internal parallel JDBC processor).

These are the idempotent merge primitives โ€” re-applying the same delta converges to the same Silver state.

The quality gate (the heart of Silver)

The Bronze โ†’ Silver promotion is where you enforce trust:

Step / feature Role
ControlTest Assert invariants before promoting โ€” row counts, key uniqueness, referential integrity, value ranges.
Data Quality Management (Administration โ–ธ Metrics Configuration) Define and manage reusable DQ rules.
WriteIncident Quarantine failing rows to an incident table instead of dropping them or letting them poison Silver.
SQLMetrics Capture per-run metrics (counts, timings) for observability.
MetaProfile / column profiler Profile column distributions to catch drift.

Quarantine, don't drop. Route rejects through WriteIncident so bad data is auditable and fixable โ€” silently dropping rows hides upstream problems.

Incremental Silver

Drive Silver from a Watermark so each run only reprocesses new/changed Bronze rows, then merge with OnstageDelta / SCD. This keeps Silver cheap and idempotent.

Gotchas

  • Dedupe key correctness โ€” the wrong key silently keeps the wrong row; verify with a ControlTest on uniqueness.
  • SCD2 effective-dating โ€” make sure change detection (hash) covers exactly the tracked columns, or you'll over- or under-version.
  • Idempotency โ€” a rerun must converge; rely on watermarks + upsert/SCD, never blind inserts.

Next

Build marts in ๐Ÿฅ‡ Gold โ€” aggregate & serve. Back to the Medallion overview ยท ๐Ÿฅ‰ Bronze.