Skip to content

Step reference

Catalog of every step you can place on the Job editor canvas, derived from JobEditor.populateListItems(). Part of the User Guide.

Status: complete β€” all in-scope steps documented. Three remain open: Datasource and Meta Profile are placeholders pending input, and Write Incident is deferred. (Living doc β€” update as the UI evolves.)

Scope. This reference covers the actively-documented platforms: Redshift, Snowflake, ClickHouse, SparkSQL. (All other platforms β€” Netezza, Yellowbrick, Greenplum, DashDB, Firebolt, Databricks, SQLWarehouse, Exasol, Synapse β€” are out of scope.)

Steps fall into three tiers: - Common steps β€” available on all documented platforms. - MPP-common steps β€” on every MPP platform (all of the above except SparkSQL). - Platform-specific steps β€” only on one platform (mostly the per-platform load target).

Legend: 🟒 documented · 🟑 placeholder (to expand) · ⬜ to do.


Common steps

Present on all documented platforms (the global control/transfer/script set is added to every job type; the data/transform/DQ items below are common to both the MPP and SparkSQL palettes).

Sources & input

⬜ Step (palette label) Type What it does
🟒 Jdbcsource JDBCSOURCE Parallel JDBC extractor β€” pulls source data into staged flat files for a JdbcTarget; auto-partitioned parallel reads + watermark-based incremental.
🟑 Datasource DATASOURCE Placeholder. Corelli pipeline source for non-JDBC sources via a general connection (MongoDB, Salesforce, other connectors). Like Jdbcsource, it stages data for a downstream JdbcTarget β€” the engine requires the data-provider step to be a Jdbcsource or Datasource. Full details TBD.
🟒 Onstagegroup ONSTAGEGROUP Bulk schema/group loader β€” loads multiple source tables into a target schema in one step, staged through object storage, with per-table watermark/key incremental merge. Platform-specific dialog.

Jdbcsource

Type JDBCSOURCE Β· WPF Steps/Pipeline/JdbcSource.cs β†’ JdbcSourceWindow.xaml ("Corelli Parallel Jdbc Extractor") Β· pairs with a JdbcTarget<Platform>

Extracts data from a source database and writes it to staged flat files that a platform-specific JdbcTarget… step then loads (JdbcTargetRedshift, JdbcTargetClickhouse, JdbcTargetSnowflake, …). On its own it only extracts and stages β€” it must be wired to a JdbcTarget.

Parallel extraction (auto-partitioning). Turn on Auto partitioning over a numeric or timestamp incrementing column. Maestro runs a MIN/MAX query and generates multiple WHERE-condition "buckets" β€” e.g. 100 rows with Rows/Partition = 10 β†’ 10 buckets β€” and runs Threads of them at a time (default 4 = the parallelism factor). Leave Delim as pipe (|). Use Meta Data / Preview to inspect the dataset first.

Jdbcsource β€” Source tab (Auto partitioning): connection, partition columns, key columns, and the Delta Column for incremental loads

Source SQL tab β€” the SELECT that runs on the source; $EXTRACT_CONDITION is filled in at runtime (the incremental watermark filter).

Jdbcsource β€” Source SQL tab showing the generated SELECT with the $EXTRACT_CONDITION placeholder

Incremental loads. After the initial full parallel load, set a Delta Column (e.g. log_ts). On each scheduled run, the target's Delta Watermark MAX value is pushed back as the source's $EXTRACT_CONDITION, so only new rows are pulled.

  • Auto Partitioning Reference tab β€” for very large tables where MIN/MAX is slow, point partition sampling at a smaller reference table; leave "ref table same as source" checked if not needed.
  • Manual Partitions tab β€” when Manual partitioning is selected you must supply the extraction-criteria conditions here (the per-bucket WHERE expressions β€” one partition per condition); in Auto mode, any expressions added here override the auto-partitioning logic.
  • Performance & Quality tab β€” date/timestamp formats, fetch size, optional SAP zero-padding, Standardize Data (strip CR/LF/delimiter/quotes), and an Encode list that base64-encodes selected columns so special characters survive the flat file β€” the target decodes them with the platform's base64 function.

Jdbcsource β€” Performance & Quality tab: formats, Standardize Data, and the base64 Encode column list

Onstagegroup

Type ONSTAGEGROUP Β· "Onstage Schema Loader" Β· platform-specific class/dialog:

Platform WPF
Redshift Steps/Redshift/RSGroupLoader.cs β†’ RSGroupLoaderWindow.xaml
Snowflake Steps/Snowflake/SnowGroupLoader.cs β†’ SnowGroupLoaderWindow.xaml
ClickHouse Steps/Clickhouse/CHGroupLoader.cs β†’ CHGroupLoaderWindow.xaml
SparkSQL / Synapse Steps/Spark/SparkGroupLoader.cs β†’ SparkGroupLoaderWindow.xaml

Loads multiple tables from a source schema into a target schema in a single step. The dialog (shown here for Redshift) has two tabs.

Schema tab β€” set the Source (connection, catalog/database, schema) and Target (catalog/database, schema), plus platform-specific Configurations. For Redshift those are the S3 Connection (Maestro staging), Initial Refresh Partitions, and the S3 Upload / S3 Clean / Copy-Load commands. (Configurations differ by platform.)

Onstagegroup β€” Schema tab (Redshift): source & target schema plus S3 staging configuration

Table(s) tab β€” select one or more tables from Available Source Table(s) on the left and click > to add them. Each target table is named database/schema/table. For each table set: - Watermark(s) β€” a timestamp, date, or incrementing numeric column (e.g. a sequence). The watermark value drives incremental loads β€” only rows past the last watermark are pulled and merged into the target. - Key(s) β€” the column(s) the incremental merge keys on.

If the Watermark/Key dropdowns are empty, click Refresh Metadata.

Onstagegroup β€” Table(s) tab: source tables added to targets, each with Watermark(s) and Key(s)

Static Filter Condition (per table) β€” overrides the delta filter that runs on the source. Use it for a custom delta window (e.g. modified_date > current_date - 3) or a full load with 1=1 (you still pick a watermark/key, but the static filter forces a full extract).

Onstagegroup β€” Static Filter Condition tab for a table (1=1 forces a full load)

Caveat: a source table whose name contains special characters will not load.

SparkSQL variant β€” "Onstage Schema Loader for Spark". On SparkSQL the target is an HDFS folder, not a database schema. Pre-create the destination directory on HDFS, e.g.:

hdfs dfs -mkdir -p /ingest/adventureworks/hr

Then on the Schema tab the Target group shows Catalog = none (disabled) and a Folder field β€” click Browse to pick that directory from the HDFS tree (browsed over the Spark driver connection, e.g. ELTM_SPARK_DRIVER). The Source group and the Table(s) tab work the same as the other platforms. Load Configurations: Initial Load Partitions Β· Initial Load Threads Β· Run Count Metrics.

Onstage Schema Loader for Spark β€” Schema tab: Source schema + HDFS Folder target + Load Configurations

Browse β€” picking the target HDFS folder from the Spark driver connection tree

On run, the loader pulls the selected source tables into the target folder, creating one subdirectory per table and writing the data as Parquet files (e.g. /ingest/adventureworks/hr/<table>/…). A separate Parquet Exports tab can additionally push that output to S3 / Azure Blob via a CLI command.

Transforms

⬜ Step Type What it does
🟒 Datamask DATAMASK Masks/obfuscates PII columns β€” applies the platform hash function ($HASH_TEMPLATE β†’ $HASH_RETURNS type) to columns you tag mask, with an optional name postfix. 1 input. (PII / GDPR / HIPAA redaction.)
🟒 Dedupe DEDUPE Removes duplicate rows β€” pick key column(s) + a sort/order expression; keeps one row per key (the first by the sort order) via ROW_NUMBER() OVER (PARTITION BY keys ORDER BY sort). 1 input.
🟒 Filter FILTER Row filter β€” keeps only rows matching a boolean condition built in the expression editor ("Filter Condition"). 1 input β†’ same columns out.
🟒 Aggregate2 AGGREGATE2 Same dialog as Function2 (aggregate emphasis) β€” group-by + aggregate output columns. 1 input.
🟒 Function2 FUNCTION2 Column expression editor β€” define output columns as scalar F(x) or aggregate Ξ£(x) with alias/type/expression, plus WHERE/HAVING. 1 input.
🟒 Join JOIN Joins multiple sources ($N aliases) β€” pick the first source, add INNER/LEFT/… conditions + ON expressions, define aliased output columns. Multi-input.
🟒 Minus MINUS Set difference β€” outputs rows in source A that are not in source B (you pick which input is A). 2 inputs.
🟒 Union UNION Combines two sources β€” toggle Union (distinct) vs Union All (keep duplicates); map the output columns. 2 inputs.
🟒 Pivot PIVOT Pivots rows β†’ columns β€” distinct values of the pivot column become new columns, aggregated from a value column. Needs a Key column marked on the source. Output table drop-and-created each run in $SYSTEM_DEFAULT_SCHEMA. 1 input.

Function2 & Aggregate2

Type FUNCTION2 / AGGREGATE2 Β· WPF Steps/CommonTransform/Function2.cs β†’ Function2Window.xaml ("Function Expression Version 2.0") Β· 1 input

The column expression editor β€” both palette entries open the same dialog (Aggregate2 is the same step surfaced with aggregate emphasis). For each output column pick a Mode: - F(x) β€” a per-row scalar expression (these columns form the GROUP BY when any aggregate is present); - Ξ£(x) β€” an aggregate (e.g. COUNT(*), SUM(...)),

…each with an Alias, Type, and Expression. Functions / operators / constants come from the function catalog; the Filters tab adds WHERE / HAVING conditions. Full walkthrough: Configuring a step β€” the expression builder.

Function2 / Aggregate2 β€” the Function Expression editor: source columns plus scalar F(x) and aggregate Ξ£(x) output columns with alias/type/expression

Join

Type JOIN Β· WPF Steps/CommonTransform/MPPJoin.cs β†’ MPPJoinWindow.xaml Β· multi-input

Joins multiple input sources into one output. Each inbound source is referenced by a $N alias, where N is the inbound arrow's step ID (e.g. $4 = Employee, $5 = EmployeeDepartmentHistory). - Input β€” Sources lists the connected steps by alias; Columns lists their columns (Add To Output to emit one). - Join Condition(s) β€” pick the First Join Source, then add conditions: Type (INNER / LEFT / …), Join With ($N), and the ON expression (e.g. $4.`BusinessEntityID` = $5.`BusinessEntityID`). - Output Columns β€” each output's Name ($N.column), Type, and Alias; Fix Alias repairs broken $N references.

Check Mapping validates that every join / output / input alias resolves to a connected source.

Join β€” input sources by $N alias, an INNER JOIN ON $4.BusinessEntityID = $5.BusinessEntityID, and aliased output columns

Pivot

Type PIVOT Β· WPF Steps/CommonTransform/Pivot.cs β†’ PivotWindow.xaml ("Pivot Transformation") Β· 1 input

Pivots rows into columns β€” the distinct values of the Pivot ON column become new output columns, populated by an aggregate of the Pivot Value column. In the dialog: - Pivot Table Name β€” unique name for the generated pivot table. - Pivot ON Column or Expression β€” the column whose distinct values become new columns (e.g. jobtitle). - Pivot Value Column or Expression β€” the value to aggregate (e.g. businessentityid). - Aggregate function β€” applied to each generated column (e.g. count($COLUMN), sum). - Pivot-Output Column Name Prefix and Max Generated Column(s) (default 256).

Pivot Transformation dialog β€” Pivot ON jobtitle, value businessentityid, count aggregate, max 256 columns

Requires a Key on the source. The upstream source/Table step must have a Key column checked (e.g. businessentityid marked Key on the employee Table step) β€” the pivot groups on that key.

The upstream Table step with businessentityid marked as Key

The result turns each distinct pivot value into its own column (one row per key):

Pivoted output β€” one row per businessentityid, a column per job title

⚠️ Output table is drop-and-created on every run in the system default schema ($SYSTEM_DEFAULT_SCHEMA in system.cfg, e.g. integrator), so newly-arriving pivot values can add columns.

Data quality

⬜ Step Type What it does
🟒 Control Test CONTROLTEST Data-quality test β€” measures correctness/reasonableness of data. Defined first in Administration β–Έ Metrics Configuration β–Έ Control Test (name, type, tolerance, an expected-value query vs an actual-value query β†’ pass/fail), then selected in the workflow. Test type M = a measurement probe for trending (DB-size growth, counts, …) rather than pass/fail.
🟑 Meta Profile METAPROFILE Placeholder. Profiles column metadata / statistics (types, distributions, counts) for data observability. Full details TBD.

Control

⬜ Step Type What it does
🟒 Sync SYNC A barrier / sync point (aka dummy step) β€” does nothing itself. Use it to control parallelism: point arrows from N parallel steps/threads into one Sync step so it acts as a stopper that waits for all N to finish, then the next steps fire from its output. No config.
🟒 Switch SWITCH Routes flow on the success/failure status of the incoming pipeline β€” assign downstream steps to a success path vs a failure/recovery path. 1 input, β‰₯2 outputs.
🟒 File Watch FILEWATCH File sensor β€” watches an SSH server's directory for a file to arrive, polling until found or timeout.
🟒 Jdbc Watch JDBCWATCH Database sensor β€” re-runs a SQL query until its result matches a value (success) or times out (failure unless enforced).
🟒 Script Watch SCRIPTWATCH Script sensor β€” re-runs a shell/script until its output matches a value (success) or times out (configurable state).
🟒 Set Variable SETVARIABLE Sets a job variable ($VAR_n) from a Fixed value, a SQL query result, or the first line of a Shell script's output.
🟒 Watermark WATERMARK Persists a high/low watermark at Job or Batch scope; the next run picks the values up automatically for incremental delta logic.
🟒 Smart Script SMARTSCRIPT An item-list processor: INPUT = a list of values from a shell script (stdout split by newline) or SQL (first-column rows); DECLARATIONS = echo-based per-item transforms; ACTIONS = run a SQL/shell script per item with a parallel-call count (e.g. list files β†’ gzip them 8-way). Variables use a $ prefix.

Switch

Type SWITCH Β· WPF Steps/Common/SWITCH.cs β†’ SwitchWindow.xaml ("Switch Process Flow") Β· 1 input, β‰₯2 outputs

Routes the pipeline flow based on the success/failure status of the pipeline leading into it. Assign each connected downstream step to Run Following On Success or Run Following On Failure (use > to move a step between the lists). Typical use is a recovery path β€” e.g. a Jdbcsource β†’ JdbcTarget pipeline; if it fails, the Switch fires a recovery step (a SqlScript or any other step) instead of just failing the job.

Switch β€” Run Following On Success vs Run Following On Failure (recovery) path lists

Watermark

Type WATERMARK Β· WPF Steps/Common/Watermark.cs β†’ WatermarkWindow.xaml ("Update Watermark") Β·

Persists a High and/or Low watermark value at Job or Batch scope. The values come from variables set during a successful run; on the next run of the same job/batch they're picked up automatically, so you can build incremental delta expressions and other pipeline logic from them. Set either High or Low (or both).

Set Watermark β€” Job/Batch type with a High Value of $VAR_5

Set Variable

Type SETVARIABLE Β· WPF Steps/Common/SetVariable.cs β†’ SetVariableWindow.xaml ("Set Variable") Β·

Sets a job variable ($VAR_n) from one of three Value Type sources: - Fixed β€” a static value (or another variable); - SQL β€” the result of a SQL query on a connection (e.g. SELECT current_timestamp::date); - Shell β€” the first line of a shell script's output.

Set Variable β€” SQL mode setting $VAR_5 to SELECT current_timestamp::date

File Watch

Type FILEWATCH Β· WPF Steps/Common/FileWatch.cs β†’ FileWatchWindow.xaml ("File Watch") Β·

A file sensor β€” watches an SSH server for a file to arrive. Supply the SSH Connection, Directory, and File Name; the step polls every File Check Interval minutes up to the Max Wait Timeout, succeeding when the file appears (or timing out). Options: Enforce SUCCESS after Timeout and DELETE File On Normal Success.

File Watch β€” SSH connection, directory, file name, timeout + check interval

Jdbc Watch

Type JDBCWATCH Β· WPF Steps/Common/JDBCWatch.cs β†’ JdbcWatchWindow.xaml ("Watch JDBC Data State") Β·

A database sensor β€” runs a SQL query on a connection and waits until the first column of the result matches one of the Match Values (e.g. 1, SUCCESS, COMPLETED, READY, LOADED). The query (often a data-quality check returning a count/status) re-runs every Check Interval minutes until either a match β†’ success (the step completes and flow continues), or Max Wait Timeout β†’ failure by default (unless Enforce SUCCESS after Timeout is checked). First-column match limit 100K rows.

Jdbc Watch β€” database connection, match values, SQL query, timeout + check interval

Script Watch

Type SCRIPTWATCH Β· WPF Steps/Common/ScriptWatch.cs β†’ ScriptWatchWindow.xaml ("Watch Script Output") Β·

A script sensor β€” runs a shell script on the local engine server (it can invoke bash, Python, etc.) and waits until its standard output matches one of the comma-separated Value to watch for entries. Re-runs every Check Interval minutes until a match β†’ success, or Timeout β†’ the configurable State After Timeout (default FAILED).

Script Watch β€” shell script, watch values, timeout + check interval + state-after-timeout

Transfer

⬜ Step Type What it does
🟒 Sftp SFTP Downloads files from an SFTP/SSH server to the local Maestro server (paths can use variables); supports watermark + caches the downloaded-file list.
🟒 File Scanner FILESCANNER Scans the Maestro server for files via a Linux find command β†’ outputs an array of files to the next step; optional last-modified watermark (skip already-scanned) + an ignore-list DB diff.
🟒 Push File PUSHFILE Pushes locally-scanned files up to a cloud/general connection. Wire a File Scanner β†’ Push File.
🟒 Remote File Scanner REMOTEFILESCANNER Cloud equivalent of File Scanner β€” scans a cloud connection (e.g. S3) under a prefix β†’ outputs an array of files; same last-modified watermark behavior.
🟒 Pull File PULLFILE Pulls files down from a cloud/general connection to local. Wire a Remote File Scanner β†’ Pull File.

File Scanner

Type FILESCANNER Β· WPF Steps/Common/FileScanner.cs β†’ FileScannerWindow.xaml ("Local File Scanner") Β·

Scans the Maestro engine server for files by running a Linux find command (editable; Test previews the result), and outputs the matched array of file paths to the next step. - File Scanner tab β€” the Find Command (bash), a Watermark Name, and Use File-Last-Modified value as watermark: when checked, Maestro tracks the scanned files' last-modified timestamps so it won't rescan the same files next run β€” each successful run returns only new files.

File Scanner β€” the bash find command, watermark name, and use-last-modified-as-watermark

  • Ignore List tab β€” optionally diff against an external "already-loaded" registry: pick a Connection and a Query that returns loaded file names, choose a match Condition (Matches / Contains / Starts With / Ends With), and Maestro removes files already in that list from the output (toggle with Ignore File(s)).

File Scanner β€” Ignore List tab: a connection + query of loaded files, diffed against the scan

Remote File Scanner

Type REMOTEFILESCANNER Β· WPF Steps/Common/RemoteFileScanner.cs β†’ RemoteFileScannerWindow.xaml ("Remote File(s) Scanner") Β·

The cloud equivalent of File Scanner β€” scans a cloud connection (e.g. an S3 connection) under a Scan Prefix [key] (path) and outputs an array of files to the next step. Same Use File-Last-Modified value as watermark behavior (only new files on the next run). Browse opens the connection to pick/preview the prefix and its files.

Remote File Scanner β€” S3 connection, scan prefix, and last-modified watermark

Browsing an S3_CONNECTION prefix to preview the matched files

Push File

Type PUSHFILE Β· WPF Steps/Common/PushFile.cs β†’ PushFileWindow.xaml ("Push/Upload File(s)") Β· requires a File Scanner (or Sftp / Write File) upstream

Pushes the locally-scanned files up to a cloud/general connection.

A File Scanner feeding a Push File step

  • Destination tab β€” the target Connection (e.g. S3_CONNECTION) and Destination Path, with Auto-Clean at destination on success/failure.
  • Flag Options tab β€” optionally send a flag file after the upload.

Push File β€” Destination tab: S3 connection + destination path + auto-clean

Pull File

Type PULLFILE Β· WPF Steps/Common/PullFile.cs β†’ PullFileWindow.xaml ("Pull/Download File(s)") Β· requires a Remote File Scanner (or Sftp) upstream

Pulls files down from a cloud/general connection to the local server.

A Remote File Scanner feeding a Pull File step

  • Server Folder Path β€” the local destination folder, with Auto-Clean at destination on success/failure.

Pull File β€” Server Folder Path destination

Sftp

Type SFTP Β· WPF Steps/Common/Sftp.cs β†’ SftpWindow.xaml ("SFTP Pull/Download File(s)") Β·

Downloads files from an SFTP/SSH server to the local Maestro server, and caches the list of downloaded files for downstream file-ordering steps; supports watermark downloads (only new files next run). - Source β€” the SSH Connection, Directory1 (and an optional Directory2), and File/Pattern (e.g. *.csv). Paths can use variables: set $VAR_5/$VAR_6 to (say) a date pattern via a Set Variable step and reference them in the directory (/data/$VAR_5) so the scan targets a date-driven path. - Target Directory (Maestro Server) β€” the local folder, with Create directory if not exists. - Other Options β€” Use Watermark ($JOB_LOW_WATERMARK_VALUE) and Auto-Clean.

Sftp β€” source directories using $VAR_5/$VAR_6, target dir, and watermark options

Scripting & jobs

⬜ Step Type What it does
🟒 Sql Script SQLSCRIPT Runs SQL/DML statements (truncate/insert/update/…) on a connection, each ending with ;. Not for large SELECTs (no output written).
🟒 Ssh SSH Runs a shell script (bash/Python/Perl/…) on an SSH connection; success by exit code or by scanning output for a normal/abnormal string.
🟒 Job Step JOBSTEP Runs another workflow (job) as a step β€” orchestration, repetition, recovery. Nests to any depth (no loops). Actions: DO_NOTHING / RESTART / RUN_LIMIT_LOOP / RUN_FOREVER.

Sql Script

Type SQLSCRIPT Β· WPF Steps/Common/SQLScript.cs β†’ SQLScriptWindow.xaml ("SQL Script") Β·

Runs SQL / DML statements (truncate / delete / insert / update / vacuum …) on a selected database connection. Paste any number of statements, each ending with a semicolon ;. Not for large SELECTs β€” the output isn't written anywhere.

Sql Script β€” DML statements (truncate/insert/vacuum) separated by semicolons on a Redshift connection

⚠️ Known caveat: Maestro splits the script on ;, so every statement must end with ; β€” and a ; embedded inside a statement (e.g. in a string literal) can break the split.

Ssh

Type SSH Β· WPF Steps/Common/SSH.cs β†’ SSHWindow.xaml ("Execute Shell Command (SSH)") Β·

Runs a shell script on an SSH connection β€” you can invoke bash, Python, Ruby, Perl, etc. (Maestro writes the script to a temporary .sh file and runs it with bash). - Shell Script tab β€” the SSH Connection and the script. - Exit Status tab β€” how the step decides success: - Use script return value β€” exit code 0 = success, non-zero = failure; - Scan normal value in script output β€” success when the output contains the Scan Following Text (e.g. COMPLETED SUCCESSFULLY); - Scan abnormal value in script output β€” failure when the output contains the text (e.g. CRITICAL ERROR).

Ssh β€” Shell Script tab with the script and SSH connection

Ssh β€” Exit Status tab: scan the script output for a normal-success string

Job Step

Type JOBSTEP Β· WPF Steps/Common/JobStep.cs β†’ JobStepWindow.xaml ("Run Workflow (Job)") Β·

Runs another workflow (job) as a step β€” the basis for orchestration, repetition, and recovery. Pick a job from Available Jobs; one Job Step runs one job, and that job can itself contain Job Steps, so jobs nest to any depth (parent β†’ children β†’ grandchildren).

No loops. Maestro forbids recursive job graphs β€” a child (or any descendant) cannot point a Job Step back at an ancestor.

Automatic Recovery Options β€” the ACTION controls how the sub-job runs:

Action Behavior Requires Terminates when
DO_NOTHING Runs once in default mode and returns (no recovery). β€” automatically
RESTART Re-runs until the expected outcome is reached. UNTIL state, Max Run Count, Sleep Interval parent terminated manually Β· or max run count reached Β· or last run state matches the UNTIL value (e.g. COMPLETE)
RUN_LIMIT_LOOP Loops a fixed number of times. Max Run Count, Sleep Interval parent terminated manually Β· or max run count reached
RUN_FOREVER Runs perpetually. Sleep Interval parent terminated manually

Run Workflow (Job) β€” select a job, ACTION, UNTIL state, Max Run Count + Sleep Interval

Export

⬜ Step Type What it does
🟒 Write Jdbc WRITEJDBC Exports the input rows to an existing table on another JDBC database (does not create the table β€” use the DDL Template to build it first). 1 input.

Write Jdbc

Type WRITEJDBC Β· WPF Steps/CommonTransform/WriteJdbc.cs β†’ WriteJdbcWindow.xaml ("Write to JDBC Target") Β· 1 input

Exports the incoming rows into another database over JDBC. It needs an upstream data source (a Table step or the output of any transform) and writes to an existing target table.

The target table must already exist β€” Write Jdbc does not create it. Click DDL Template to preview a CREATE TABLE for the incoming columns, create that table manually on the target connection (adjusting types per the template's notes β€” e.g. timestamp β†’ datetime for SQL Server), then Browse to select it.

  • Target Load Options β€” Records per batch commit, Print verbose message on batch count, Attribute enclosed by (quote char), and the DDL Template preview button.
  • Target Table (Existing) β€” Connection, Catalog, Schema, Table (via Browse), and Truncate Target Table.

Write to JDBC Target β€” batch options, DDL Template, and the existing target-table selection

DDL Template β€” a CREATE TABLE preview of the incoming columns, used to build the target table

⚠️ Tested targets: export is validated against SQL Server, Postgres, and Oracle targets; other JDBC targets are not yet tested.


MPP-common steps

On the in-scope MPP platforms (Redshift, Snowflake, ClickHouse) but not SparkSQL.

⬜ Step Type What it does Notes
🟒 Table TABLE Reads from / writes to a referenced table. Construct: Existing · Create · Temp. Load Options: Truncate · Run Statistics (GENERATE STATISTICS) · Run Vacuum (GROOM) · Upsert (update+insert by Key). Column flags: Cluster/Dist/Sort · Key (needed for Upsert) · Ident. Add/remove columns only for Create/Temp
🟒 View VIEW References an existing database view (database / schema / view) and brings its columns into the pipeline as a read source β€” no input; output columns = the view's columns.
🟒 Sql Metrics SQLMETRICS Runs a selected set of data-quality / rowcount metrics (from the metrics catalog β€” metrics.dq_definition) against the source and/or target. Pick the metric list and toggle Run Source / Run Target. DQ / observability
🟒 Write File WRITEFILE Exports a SQL query result (on the platform connection) to delimited flat files on the server β€” set the SQL, output directory, file prefix / extension (default .psv), delimiter (\|), per-file size MB (split, default 128), gzip, auto-clean, sanitize, and fetch size. Export
🟒 SCD1 SCD1 Slowly Changing Dimension Type-1 β€” upserts into a target dimension, overwriting changed attributes in place (no history, unlike SCD2). Set target catalog / schema / table and mark each column Match / Change / Ignore. 1 input. All MPP except ClickHouse
🟒 SCD2 SCD2 Slowly Changing Dimension Type-2 β€” keeps history via effective/expiration timestamps (current row's exp_ts = 9999-12-31). Mark each column M (match/identity) Β· C (track change) Β· X (ignore), and designate the dim-id / eff-ts / exp-ts columns. On a change it expires the old row and inserts a new current row. All MPP (ClickHouse included)

Platform-specific steps

Each platform has its own load target (JdbcTarget<Platform>) and, in some cases, a staging/file step. Same pattern, per-platform implementation.

Redshift

⬜ Step Type What it does
🟒 Jdbc Target Redshift JDBCTARGETREDSHIFT Corelli loader β€” loads a Jdbcsource's staged files into Redshift via S3 + COPY; incremental delta-watermark.
🟒 Redshift Stage REDSHIFTSTAGE Stages a Local File's parquet output to S3, creates a temp Redshift table, bulk-loads the staged files, then cleans them up. Used in the Local File β†’ Redshift Stage β†’ Table pipeline; pick the S3 Connection + IAM or KEY auth.
🟒 Local File LOCAL_FILE Reads flat files from the local filesystem into the pipeline (a file source).
🟒 Dataframe DATAFRAME Reads HDFS file(s)/folder into a Spark data frame + a temp global view (Parquet/Orc/Csv/Json/Text); reader or writer per the arrow direction. Write Option: truncate / append / upsert-append. Output Mode: All Rows · Delta Passthrough (rows from this run) · Last Inserted Version (needs key columns).

Jdbc Target Redshift

Type JDBCTARGETREDSHIFT Β· WPF Steps/Pipeline/JdbcTargetRedShift.cs β†’ JdbcTargetRedShiftWindow.xaml ("Redshift Target β€” Corelli Parallel Loader") Β· requires an upstream Jdbcsource

The target half of the Corelli pipeline: it loads the flat files staged by a Jdbcsource into Redshift. It only works downstream of a Jdbcsource (not other step types).

A Corelli pipeline β€” SRC.t_step_run_detail (a Jdbcsource) feeding a JDBCTARGETREDSHIFT step

Target Table tab β€” connection, catalog/db, schema, table. Check Table Exists if the target already exists; leave it unchecked to have Maestro create it. Truncate and Loader Threads as needed.

Jdbc Target Redshift β€” Target Table tab

Source Column(s) tab β€” click Refresh to pull the incoming columns; use the pencil icon to correct any data types that look wrong.

Jdbc Target Redshift β€” Source Column(s) tab with Refresh and the pencil data-type editor

Mapping Columns(s) tab β€” click Reset to auto-map source β†’ target columns, then adjust as needed.

Load Options tab β€” the staging commands (under $OBJECTSTORAGE): S3 Upload (aws s3 cp to S3), S3 Clean (aws s3 rm --recursive after load), and Copy/Load (Redshift COPY … iam_role … gzip). Maestro runs the AWS CLI to push the files to S3, runs COPY, then removes the staged files on success. Don't change the $VARIABLE placeholders.

Jdbc Target Redshift β€” Load Options tab: S3 upload / clean / copy-load commands

Delta Watermark tab β€” a query that runs on the target (SELECT MAX($COLUMN) FROM $TARGET). When a watermark/Delta column is set on the Jdbcsource and the job runs on a schedule (e.g. hourly), this MAX value becomes the source's $EXTRACT_CONDITION filter, so data loads incrementally.

Jdbc Target Redshift β€” Delta Watermark tab: SELECT MAX($COLUMN) FROM $TARGET

Snowflake

⬜ Step Type What it does
🟒 File Loader FILELOADER Loads flat files into an existing table whose structure matches the file (output β†’ a Table step). Source files come from an Onstage Group step, SFTP, or the ELT Maestro server; external-table load options are customizable.
🟒 Jdbc Target Snowflake JDBCTARGETSNOWFLAKE Corelli loader β€” the Snowflake target half of a Jdbcsource β†’ JdbcTargetSnowflake pipeline (requires an upstream Jdbcsource); same shape as the other JdbcTarget* loaders. Loads the source's staged files into Snowflake via COPY INTO $TARGET FROM $STAGEPATH with the CORELLI_FORMAT_PIPE0 file format. Tabs: Target Table (connection / catalog / schema / table Β· Truncate Β· Loader Threads) Β· Source Column(s) Β· Mapping Column(s) Β· Load Options (the COPY command) Β· Delta Watermark (SELECT MAX($COLUMN) FROM $TARGET, incremental).

ClickHouse

⬜ Step Type What it does
🟒 Jdbc Target Clickhouse JDBCTARGETCLICKHOUSE Corelli loader β€” the ClickHouse target half of a Jdbcsource β†’ JdbcTargetClickhouse pipeline (requires an upstream Jdbcsource); same shape as the other JdbcTarget* loaders. Difference: it loads files directly to the ClickHouse server via clickhouse-client (… INSERT INTO $TARGET FORMAT CSV < $LOCALFILE) rather than staging through S3 β€” so the ELT Maestro host must have clickhouse-client installed and the load command configured (the $MPP_HOST / $DELIMITER / $TARGET placeholders come from the integrator's system.cfg; see CLICKHOUSE-AUTH.md). Tabs: Target Table (connection / catalog / schema / table Β· Truncate Β· Loader Threads) Β· Source Column(s) Β· Mapping Column(s) Β· Load Options Β· Delta Watermark (SELECT MAX($COLUMN) FROM $TARGET, incremental).
⬜ Write Incident WRITEINCIDENT TBD
🟒 Dataframe DATAFRAME Reads HDFS file(s)/folder into a Spark data frame + a temp global view (Parquet/Orc/Csv/Json/Text); reader or writer per the arrow direction. Write Option: truncate / append / upsert-append. Output Mode: All Rows · Delta Passthrough (rows from this run) · Last Inserted Version (needs key columns).

SparkSQL

⬜ Step Type What it does
🟒 Dataframe DATAFRAME Reads HDFS file(s)/folder into a Spark data frame + a temp global view (Parquet/Orc/Csv/Json/Text); reader or writer per the arrow direction. Write Option: truncate / append / upsert-append. Output Mode: All Rows · Delta Passthrough (rows from this run) · Last Inserted Version (needs key columns).
🟒 Local File LOCAL_FILE Reads flat files from the local filesystem into the pipeline (a file source).
🟒 Jdbc Target HDFS JDBCTARGETHDFS Corelli HDFS Spark DataFrame loader β€” the SparkSQL target half of a Jdbcsource β†’ JdbcTargetHDFS pipeline (requires an upstream Jdbcsource). Writes the source's rows to an HDFS Target Path as snappy-compressed Parquet. Loader Threads Β· Truncate Β· Source Column(s) refresh (typed for SparkSQL) Β· Mapping Column(s) auto-map (Reset / Check).
🟒 Spark Data Cache SPARKDATACACHE Caches a data frame to disk/memory β€” a Spark persist (same output as input). Storage levels: DISK_ONLY, MEMORY_AND_DISK, MEMORY_ONLY(_SER/_2), …
🟒 Aggregate AGGREGATE Performs aggregations β€” pick Group column(s) and f(Agg) column(s) (sum/avg/…); the UI suggests output names/types. (Spark's classic aggregate; the expression-builder equivalent is Aggregate2.)
🟒 Onstage Delta ONSTAGEDELTA Continuous loop delta loader (SparkSQL-only) β€” runs in a perpetual loop, polling a source Table/View every Delay (Seconds) (default 60) and merging only rows past a watermark (WM Col/Expr) into the target, starting from a configurable StartWM Value. Source + Target each take connection / database / schema / table + watermark expression (DummySQL = a placeholder query); Mapping Column(s) maps sourceβ†’target.
🟒 Export EXPORT Exports the incoming Spark data frame to a local destination folder on the ELT Maestro server in a chosen File Format (default Parquet). 1 input; the column list lets you mark Key column(s).
🟒 ML Feature Trainer MLFEATURETRAINER Trains one or more Spark MLlib models on the input data frame β€” pick ID, Feature, and Label column(s), add the model(s) to train, and name the Model Topic. 1 input. (Step label: "Train <topic> with N model(s)".)
🟒 ML Feature Predictor MLFEATUREPREDICTOR Scores the input data frame with previously-trained Spark MLlib models β€” pick the Topic Path, the Model(s), Feature + ID column(s), and output file parts (default 4). Emits the ID columns plus features, rawPrediction, probability, topicDistribution, prediction, ml_model_name, ml_timestamp.
🟒 ML Feature Engineer MLFEATUREENGINEER Builds engineered feature columns via a list of actions β€” each applies a Spark MLlib transform (action type) to an input column, producing a new output column. Output = input columns + the new engineered columns.
🟒 ML Poly Feature Engineer MLPOLYFEATUREENGINEER Like ML Feature Engineer, but applies a single Spark MLlib feature-engineering rule (transName/transType) across multiple input columns at once, producing the corresponding output column(s) appended to the input data frame.

Tie-back to architecture

The steps above aren't an undifferentiated toolbox β€” each one belongs to a stage of a Bronze β†’ Silver β†’ Gold medallion pipeline. This is the step β†’ layer view (the reverse of MEDALLION.md, which goes layer β†’ step); use it to decide where a step belongs when you build a workflow. Scoped to the documented platforms (Redshift, Snowflake, ClickHouse, SparkSQL).

πŸ₯‰ Bronze β€” raw landing / ingestion

Capture faithfully, transform minimally, append-only.

Role Steps
Pull / extract Jdbcsource, Datasource, Local File, Sftp, Pull File
Land to the lake / target Jdbc Target HDFS (Parquet to HDFS), Onstage Group (multi-table β†’ parquet/external tables), Onstage Delta (continuous-loop landing)
Arrival triggers File Watch, JDBC Watch, Script Watch, File Scanner, Remote File Scanner
Incremental capture Watermark (+ the isCDC delta primitive), Set Variable

πŸ₯ˆ Silver β€” cleaned, conformed, validated

Idempotent and re-runnable; gate promotion on data quality.

Role Steps
Clean / shape Filter, Dedupe, Function2 (derivations), Datamask (PII), View, Pivot
Conform / integrate Join, Union, Minus, Spark Data Cache
Historize dimensions SCD1, SCD2
Quality gate (Bronze→Silver) Control Test, Sql Metrics, Meta Profile, Write Incident (quarantine bad rows)

πŸ₯‡ Gold β€” business marts / serving

Business-grain, serving-optimized.

Role Steps
Aggregate / model Aggregate, Aggregate2, Table (+ Onstage Group loaders)
Publish to serving MPP Jdbc Target Redshift / Snowflake / Clickhouse / HDFS, Redshift Stage, File Loader, Write Jdbc, Write File, Export
ML feature / serving ML Feature Engineer, ML Poly Feature Engineer, ML Feature Trainer, ML Feature Predictor

The orchestration spine (cross-layer)

These don't belong to one layer — they wire the layers together into one end-to-end pipeline: Job Step (chain Bronze→Silver→Gold workflows), Switch / Set Variable / Sync (conditional flow + barriers), SQL Script / SSH / Smart Script (custom logic + maintenance), and Push File / Pull File / Sftp (movement between systems). See the orchestration spine and the Silver quality gate for how scheduling, alerting, and DQ gating stitch a tiered pipeline together.