Skip to content

🥉 Bronze layer — ingestion & landing

Deep-dive companion to Medallion architecture with ELTMaestro. The Bronze layer captures source data as-is and lands it for downstream processing. Goal: faithful, append-only, replayable raw history — minimal transformation.

Ingestion patterns

Pick the pattern that matches the source and the change rate:

Pattern Steps When
Full / snapshot load Jdbcsource / Datasource → land Small/reference tables, or sources with no reliable change marker.
Incremental (watermark) Watermark + Jdbcsource/Datasource Source has a monotonic column (id / updated_at). The Watermark step persists the last position between runs and pulls only newer rows.
CDC / upsert isCDC + $QUERY_CDC_DELETE / $QUERY_CDC_INSERT Source emits inserts/updates/deletes you want to apply as a delete-then-insert merge.
File drop Local_file, Sftp, Pullfile Files delivered to a folder / SFTP / S3.
Event-driven FileWatch, FileScanner, RemoteFileScanner, JdbcWatch, ScriptWatch Trigger a landing run when a file appears or a source condition is met.
Connector extract Mongo / Salesforce extract paths NoSQL / SaaS sources.

Incremental is the default for Bronze. A Watermark keeps each run cheap and the lake append-only. Reserve full loads for small/reference data.

Landing targets

Where raw data comes to rest:

Target Step Notes
Object storage (S3 / Azure Blob) staged via $OBJECT_STORAGE (S3_CONNECTION / BLOB_CONNECTION) The lake. See Administration â–¸ cloud connections.
HDFS Jdbctargethdfs On-cluster lake storage.
Hive external tables Onstagegroup Registers landed files as Spark/Hive external tables ($HIVE_CONNECTION/$HIVE_DATABASE/$HIVE_SCHEMA) so Silver can query them as tables.
Onstage staging tables Onstagegroup / file loaders Staging tables in the MPP for the next merge.

Append-only discipline

Bronze must stay immutable and replayable:

  1. Insert-only targets — never update/delete in Bronze; corrections happen in Silver.
  2. Watermark every incremental source so reruns don't double-land and only new rows arrive.
  3. Keep raw fidelity — land columns as-received (string/raw types are fine); defer type-casting and cleaning to Silver. Base64-encode awkward payloads if a staging format needs it (decoded later via $B64EXPRESSION).
  4. Partition by load time / batch ($GUID scopes a run's staged objects) so you can replay or audit a specific landing.

Gotchas

  • Late-arriving data — a high-water mark on updated_at can skip rows committed out of order; prefer an id-based or overlap-window watermark when the source allows.
  • Idempotency — a rerun of the same window must not duplicate rows; rely on the Watermark position, not wall-clock.
  • Schema drift — new/renamed source columns should land without breaking; keep Bronze permissive and resolve shape in Silver.

Next

Promote to 🥈 Silver — clean, conform, validate. Back to the Medallion overview.