Step reference¶
Catalog of every step you can place on the Job editor canvas, derived from JobEditor.populateListItems(). Part of the User Guide.
Status: complete β all in-scope steps documented. Three remain open:
DatasourceandMeta Profileare placeholders pending input, andWrite Incidentis deferred. (Living doc β update as the UI evolves.)
Scope. This reference covers the actively-documented platforms: Redshift, Snowflake, ClickHouse, SparkSQL. (All other platforms β Netezza, Yellowbrick, Greenplum, DashDB, Firebolt, Databricks, SQLWarehouse, Exasol, Synapse β are out of scope.)
Steps fall into three tiers: - Common steps β available on all documented platforms. - MPP-common steps β on every MPP platform (all of the above except SparkSQL). - Platform-specific steps β only on one platform (mostly the per-platform load target).
Legend: π’ documented Β· π‘ placeholder (to expand) Β· β¬ to do.
Common steps¶
Present on all documented platforms (the global control/transfer/script set is added to every job type; the data/transform/DQ items below are common to both the MPP and SparkSQL palettes).
Sources & input¶
| β¬ | Step (palette label) | Type | What it does |
|---|---|---|---|
| π’ | Jdbcsource | JDBCSOURCE |
Parallel JDBC extractor β pulls source data into staged flat files for a JdbcTarget; auto-partitioned parallel reads + watermark-based incremental. |
| π‘ | Datasource | DATASOURCE |
Placeholder. Corelli pipeline source for non-JDBC sources via a general connection (MongoDB, Salesforce, other connectors). Like Jdbcsource, it stages data for a downstream JdbcTarget β the engine requires the data-provider step to be a Jdbcsource or Datasource. Full details TBD. |
| π’ | Onstagegroup | ONSTAGEGROUP |
Bulk schema/group loader β loads multiple source tables into a target schema in one step, staged through object storage, with per-table watermark/key incremental merge. Platform-specific dialog. |
Jdbcsource¶
Type JDBCSOURCE Β· WPF Steps/Pipeline/JdbcSource.cs β JdbcSourceWindow.xaml ("Corelli Parallel Jdbc Extractor") Β· pairs with a JdbcTarget<Platform>
Extracts data from a source database and writes it to staged flat files that a platform-specific JdbcTargetβ¦ step then loads (JdbcTargetRedshift, JdbcTargetClickhouse, JdbcTargetSnowflake, β¦). On its own it only extracts and stages β it must be wired to a JdbcTarget.
Parallel extraction (auto-partitioning). Turn on Auto partitioning over a numeric or timestamp incrementing column. Maestro runs a MIN/MAX query and generates multiple WHERE-condition "buckets" β e.g. 100 rows with Rows/Partition = 10 β 10 buckets β and runs Threads of them at a time (default 4 = the parallelism factor). Leave Delim as pipe (|). Use Meta Data / Preview to inspect the dataset first.

Source SQL tab β the SELECT that runs on the source; $EXTRACT_CONDITION is filled in at runtime (the incremental watermark filter).

Incremental loads. After the initial full parallel load, set a Delta Column (e.g. log_ts). On each scheduled run, the target's Delta Watermark MAX value is pushed back as the source's $EXTRACT_CONDITION, so only new rows are pulled.
- Auto Partitioning Reference tab β for very large tables where MIN/MAX is slow, point partition sampling at a smaller reference table; leave "ref table same as source" checked if not needed.
- Manual Partitions tab β when Manual partitioning is selected you must supply the extraction-criteria conditions here (the per-bucket
WHEREexpressions β one partition per condition); in Auto mode, any expressions added here override the auto-partitioning logic. - Performance & Quality tab β date/timestamp formats, fetch size, optional SAP zero-padding, Standardize Data (strip CR/LF/delimiter/quotes), and an Encode list that base64-encodes selected columns so special characters survive the flat file β the target decodes them with the platform's base64 function.

Onstagegroup¶
Type ONSTAGEGROUP Β· "Onstage Schema Loader" Β· platform-specific class/dialog:
| Platform | WPF |
|---|---|
| Redshift | Steps/Redshift/RSGroupLoader.cs β RSGroupLoaderWindow.xaml |
| Snowflake | Steps/Snowflake/SnowGroupLoader.cs β SnowGroupLoaderWindow.xaml |
| ClickHouse | Steps/Clickhouse/CHGroupLoader.cs β CHGroupLoaderWindow.xaml |
| SparkSQL / Synapse | Steps/Spark/SparkGroupLoader.cs β SparkGroupLoaderWindow.xaml |
Loads multiple tables from a source schema into a target schema in a single step. The dialog (shown here for Redshift) has two tabs.
Schema tab β set the Source (connection, catalog/database, schema) and Target (catalog/database, schema), plus platform-specific Configurations. For Redshift those are the S3 Connection (Maestro staging), Initial Refresh Partitions, and the S3 Upload / S3 Clean / Copy-Load commands. (Configurations differ by platform.)

Table(s) tab β select one or more tables from Available Source Table(s) on the left and click > to add them. Each target table is named database/schema/table. For each table set:
- Watermark(s) β a timestamp, date, or incrementing numeric column (e.g. a sequence). The watermark value drives incremental loads β only rows past the last watermark are pulled and merged into the target.
- Key(s) β the column(s) the incremental merge keys on.
If the Watermark/Key dropdowns are empty, click Refresh Metadata.

Static Filter Condition (per table) β overrides the delta filter that runs on the source. Use it for a custom delta window (e.g. modified_date > current_date - 3) or a full load with 1=1 (you still pick a watermark/key, but the static filter forces a full extract).

Caveat: a source table whose name contains special characters will not load.
SparkSQL variant β "Onstage Schema Loader for Spark". On SparkSQL the target is an HDFS folder, not a database schema. Pre-create the destination directory on HDFS, e.g.:
Then on the Schema tab the Target group shows Catalog = none (disabled) and a Folder field β click Browse to pick that directory from the HDFS tree (browsed over the Spark driver connection, e.g. ELTM_SPARK_DRIVER). The Source group and the Table(s) tab work the same as the other platforms. Load Configurations: Initial Load Partitions Β· Initial Load Threads Β· Run Count Metrics.


On run, the loader pulls the selected source tables into the target folder, creating one subdirectory per table and writing the data as Parquet files (e.g. /ingest/adventureworks/hr/<table>/β¦). A separate Parquet Exports tab can additionally push that output to S3 / Azure Blob via a CLI command.
Transforms¶
| β¬ | Step | Type | What it does |
|---|---|---|---|
| π’ | Datamask | DATAMASK |
Masks/obfuscates PII columns β applies the platform hash function ($HASH_TEMPLATE β $HASH_RETURNS type) to columns you tag mask, with an optional name postfix. 1 input. (PII / GDPR / HIPAA redaction.) |
| π’ | Dedupe | DEDUPE |
Removes duplicate rows β pick key column(s) + a sort/order expression; keeps one row per key (the first by the sort order) via ROW_NUMBER() OVER (PARTITION BY keys ORDER BY sort). 1 input. |
| π’ | Filter | FILTER |
Row filter β keeps only rows matching a boolean condition built in the expression editor ("Filter Condition"). 1 input β same columns out. |
| π’ | Aggregate2 | AGGREGATE2 |
Same dialog as Function2 (aggregate emphasis) β group-by + aggregate output columns. 1 input. |
| π’ | Function2 | FUNCTION2 |
Column expression editor β define output columns as scalar F(x) or aggregate Ξ£(x) with alias/type/expression, plus WHERE/HAVING. 1 input. |
| π’ | Join | JOIN |
Joins multiple sources ($N aliases) β pick the first source, add INNER/LEFT/β¦ conditions + ON expressions, define aliased output columns. Multi-input. |
| π’ | Minus | MINUS |
Set difference β outputs rows in source A that are not in source B (you pick which input is A). 2 inputs. |
| π’ | Union | UNION |
Combines two sources β toggle Union (distinct) vs Union All (keep duplicates); map the output columns. 2 inputs. |
| π’ | Pivot | PIVOT |
Pivots rows β columns β distinct values of the pivot column become new columns, aggregated from a value column. Needs a Key column marked on the source. Output table drop-and-created each run in $SYSTEM_DEFAULT_SCHEMA. 1 input. |
Function2 & Aggregate2¶
Type FUNCTION2 / AGGREGATE2 Β· WPF Steps/CommonTransform/Function2.cs β Function2Window.xaml ("Function Expression Version 2.0") Β· 1 input
The column expression editor β both palette entries open the same dialog (Aggregate2 is the same step surfaced with aggregate emphasis). For each output column pick a Mode:
- F(x) β a per-row scalar expression (these columns form the GROUP BY when any aggregate is present);
- Ξ£(x) β an aggregate (e.g. COUNT(*), SUM(...)),
β¦each with an Alias, Type, and Expression. Functions / operators / constants come from the function catalog; the Filters tab adds WHERE / HAVING conditions. Full walkthrough: Configuring a step β the expression builder.

Join¶
Type JOIN Β· WPF Steps/CommonTransform/MPPJoin.cs β MPPJoinWindow.xaml Β· multi-input
Joins multiple input sources into one output. Each inbound source is referenced by a $N alias, where N is the inbound arrow's step ID (e.g. $4 = Employee, $5 = EmployeeDepartmentHistory).
- Input β Sources lists the connected steps by alias; Columns lists their columns (Add To Output to emit one).
- Join Condition(s) β pick the First Join Source, then add conditions: Type (INNER / LEFT / β¦), Join With ($N), and the ON expression (e.g. $4.`BusinessEntityID` = $5.`BusinessEntityID`).
- Output Columns β each output's Name ($N.column), Type, and Alias; Fix Alias repairs broken $N references.
Check Mapping validates that every join / output / input alias resolves to a connected source.

Pivot¶
Type PIVOT Β· WPF Steps/CommonTransform/Pivot.cs β PivotWindow.xaml ("Pivot Transformation") Β· 1 input
Pivots rows into columns β the distinct values of the Pivot ON column become new output columns, populated by an aggregate of the Pivot Value column. In the dialog:
- Pivot Table Name β unique name for the generated pivot table.
- Pivot ON Column or Expression β the column whose distinct values become new columns (e.g. jobtitle).
- Pivot Value Column or Expression β the value to aggregate (e.g. businessentityid).
- Aggregate function β applied to each generated column (e.g. count($COLUMN), sum).
- Pivot-Output Column Name Prefix and Max Generated Column(s) (default 256).

Requires a Key on the source. The upstream source/Table step must have a Key column checked (e.g. businessentityid marked Key on the employee Table step) β the pivot groups on that key.

The result turns each distinct pivot value into its own column (one row per key):

β οΈ Output table is drop-and-created on every run in the system default schema (
$SYSTEM_DEFAULT_SCHEMAinsystem.cfg, e.g.integrator), so newly-arriving pivot values can add columns.
Data quality¶
| β¬ | Step | Type | What it does |
|---|---|---|---|
| π’ | Control Test | CONTROLTEST |
Data-quality test β measures correctness/reasonableness of data. Defined first in Administration βΈ Metrics Configuration βΈ Control Test (name, type, tolerance, an expected-value query vs an actual-value query β pass/fail), then selected in the workflow. Test type M = a measurement probe for trending (DB-size growth, counts, β¦) rather than pass/fail. |
| π‘ | Meta Profile | METAPROFILE |
Placeholder. Profiles column metadata / statistics (types, distributions, counts) for data observability. Full details TBD. |
Control¶
| β¬ | Step | Type | What it does |
|---|---|---|---|
| π’ | Sync | SYNC |
A barrier / sync point (aka dummy step) β does nothing itself. Use it to control parallelism: point arrows from N parallel steps/threads into one Sync step so it acts as a stopper that waits for all N to finish, then the next steps fire from its output. No config. |
| π’ | Switch | SWITCH |
Routes flow on the success/failure status of the incoming pipeline β assign downstream steps to a success path vs a failure/recovery path. 1 input, β₯2 outputs. |
| π’ | File Watch | FILEWATCH |
File sensor β watches an SSH server's directory for a file to arrive, polling until found or timeout. |
| π’ | Jdbc Watch | JDBCWATCH |
Database sensor β re-runs a SQL query until its result matches a value (success) or times out (failure unless enforced). |
| π’ | Script Watch | SCRIPTWATCH |
Script sensor β re-runs a shell/script until its output matches a value (success) or times out (configurable state). |
| π’ | Set Variable | SETVARIABLE |
Sets a job variable ($VAR_n) from a Fixed value, a SQL query result, or the first line of a Shell script's output. |
| π’ | Watermark | WATERMARK |
Persists a high/low watermark at Job or Batch scope; the next run picks the values up automatically for incremental delta logic. |
| π’ | Smart Script | SMARTSCRIPT |
An item-list processor: INPUT = a list of values from a shell script (stdout split by newline) or SQL (first-column rows); DECLARATIONS = echo-based per-item transforms; ACTIONS = run a SQL/shell script per item with a parallel-call count (e.g. list files β gzip them 8-way). Variables use a $ prefix. |
Switch¶
Type SWITCH Β· WPF Steps/Common/SWITCH.cs β SwitchWindow.xaml ("Switch Process Flow") Β· 1 input, β₯2 outputs
Routes the pipeline flow based on the success/failure status of the pipeline leading into it. Assign each connected downstream step to Run Following On Success or Run Following On Failure (use > to move a step between the lists). Typical use is a recovery path β e.g. a Jdbcsource β JdbcTarget pipeline; if it fails, the Switch fires a recovery step (a SqlScript or any other step) instead of just failing the job.

Watermark¶
Type WATERMARK Β· WPF Steps/Common/Watermark.cs β WatermarkWindow.xaml ("Update Watermark") Β·
Persists a High and/or Low watermark value at Job or Batch scope. The values come from variables set during a successful run; on the next run of the same job/batch they're picked up automatically, so you can build incremental delta expressions and other pipeline logic from them. Set either High or Low (or both).

Set Variable¶
Type SETVARIABLE Β· WPF Steps/Common/SetVariable.cs β SetVariableWindow.xaml ("Set Variable") Β·
Sets a job variable ($VAR_n) from one of three Value Type sources:
- Fixed β a static value (or another variable);
- SQL β the result of a SQL query on a connection (e.g. SELECT current_timestamp::date);
- Shell β the first line of a shell script's output.

File Watch¶
Type FILEWATCH Β· WPF Steps/Common/FileWatch.cs β FileWatchWindow.xaml ("File Watch") Β·
A file sensor β watches an SSH server for a file to arrive. Supply the SSH Connection, Directory, and File Name; the step polls every File Check Interval minutes up to the Max Wait Timeout, succeeding when the file appears (or timing out). Options: Enforce SUCCESS after Timeout and DELETE File On Normal Success.

Jdbc Watch¶
Type JDBCWATCH Β· WPF Steps/Common/JDBCWatch.cs β JdbcWatchWindow.xaml ("Watch JDBC Data State") Β·
A database sensor β runs a SQL query on a connection and waits until the first column of the result matches one of the Match Values (e.g. 1, SUCCESS, COMPLETED, READY, LOADED). The query (often a data-quality check returning a count/status) re-runs every Check Interval minutes until either a match β success (the step completes and flow continues), or Max Wait Timeout β failure by default (unless Enforce SUCCESS after Timeout is checked). First-column match limit 100K rows.

Script Watch¶
Type SCRIPTWATCH Β· WPF Steps/Common/ScriptWatch.cs β ScriptWatchWindow.xaml ("Watch Script Output") Β·
A script sensor β runs a shell script on the local engine server (it can invoke bash, Python, etc.) and waits until its standard output matches one of the comma-separated Value to watch for entries. Re-runs every Check Interval minutes until a match β success, or Timeout β the configurable State After Timeout (default FAILED).

Transfer¶
| β¬ | Step | Type | What it does |
|---|---|---|---|
| π’ | Sftp | SFTP |
Downloads files from an SFTP/SSH server to the local Maestro server (paths can use variables); supports watermark + caches the downloaded-file list. |
| π’ | File Scanner | FILESCANNER |
Scans the Maestro server for files via a Linux find command β outputs an array of files to the next step; optional last-modified watermark (skip already-scanned) + an ignore-list DB diff. |
| π’ | Push File | PUSHFILE |
Pushes locally-scanned files up to a cloud/general connection. Wire a File Scanner β Push File. |
| π’ | Remote File Scanner | REMOTEFILESCANNER |
Cloud equivalent of File Scanner β scans a cloud connection (e.g. S3) under a prefix β outputs an array of files; same last-modified watermark behavior. |
| π’ | Pull File | PULLFILE |
Pulls files down from a cloud/general connection to local. Wire a Remote File Scanner β Pull File. |
File Scanner¶
Type FILESCANNER Β· WPF Steps/Common/FileScanner.cs β FileScannerWindow.xaml ("Local File Scanner") Β·
Scans the Maestro engine server for files by running a Linux find command (editable; Test previews the result), and outputs the matched array of file paths to the next step.
- File Scanner tab β the Find Command (bash), a Watermark Name, and Use File-Last-Modified value as watermark: when checked, Maestro tracks the scanned files' last-modified timestamps so it won't rescan the same files next run β each successful run returns only new files.

- Ignore List tab β optionally diff against an external "already-loaded" registry: pick a Connection and a Query that returns loaded file names, choose a match Condition (Matches / Contains / Starts With / Ends With), and Maestro removes files already in that list from the output (toggle with Ignore File(s)).

Remote File Scanner¶
Type REMOTEFILESCANNER Β· WPF Steps/Common/RemoteFileScanner.cs β RemoteFileScannerWindow.xaml ("Remote File(s) Scanner") Β·
The cloud equivalent of File Scanner β scans a cloud connection (e.g. an S3 connection) under a Scan Prefix [key] (path) and outputs an array of files to the next step. Same Use File-Last-Modified value as watermark behavior (only new files on the next run). Browse opens the connection to pick/preview the prefix and its files.


Push File¶
Type PUSHFILE Β· WPF Steps/Common/PushFile.cs β PushFileWindow.xaml ("Push/Upload File(s)") Β· requires a File Scanner (or Sftp / Write File) upstream
Pushes the locally-scanned files up to a cloud/general connection.

- Destination tab β the target Connection (e.g.
S3_CONNECTION) and Destination Path, with Auto-Clean at destination on success/failure. - Flag Options tab β optionally send a flag file after the upload.

Pull File¶
Type PULLFILE Β· WPF Steps/Common/PullFile.cs β PullFileWindow.xaml ("Pull/Download File(s)") Β· requires a Remote File Scanner (or Sftp) upstream
Pulls files down from a cloud/general connection to the local server.

- Server Folder Path β the local destination folder, with Auto-Clean at destination on success/failure.

Sftp¶
Type SFTP Β· WPF Steps/Common/Sftp.cs β SftpWindow.xaml ("SFTP Pull/Download File(s)") Β·
Downloads files from an SFTP/SSH server to the local Maestro server, and caches the list of downloaded files for downstream file-ordering steps; supports watermark downloads (only new files next run).
- Source β the SSH Connection, Directory1 (and an optional Directory2), and File/Pattern (e.g. *.csv). Paths can use variables: set $VAR_5/$VAR_6 to (say) a date pattern via a Set Variable step and reference them in the directory (/data/$VAR_5) so the scan targets a date-driven path.
- Target Directory (Maestro Server) β the local folder, with Create directory if not exists.
- Other Options β Use Watermark ($JOB_LOW_WATERMARK_VALUE) and Auto-Clean.

Scripting & jobs¶
| β¬ | Step | Type | What it does |
|---|---|---|---|
| π’ | Sql Script | SQLSCRIPT |
Runs SQL/DML statements (truncate/insert/update/β¦) on a connection, each ending with ;. Not for large SELECTs (no output written). |
| π’ | Ssh | SSH |
Runs a shell script (bash/Python/Perl/β¦) on an SSH connection; success by exit code or by scanning output for a normal/abnormal string. |
| π’ | Job Step | JOBSTEP |
Runs another workflow (job) as a step β orchestration, repetition, recovery. Nests to any depth (no loops). Actions: DO_NOTHING / RESTART / RUN_LIMIT_LOOP / RUN_FOREVER. |
Sql Script¶
Type SQLSCRIPT Β· WPF Steps/Common/SQLScript.cs β SQLScriptWindow.xaml ("SQL Script") Β·
Runs SQL / DML statements (truncate / delete / insert / update / vacuum β¦) on a selected database connection. Paste any number of statements, each ending with a semicolon ;. Not for large SELECTs β the output isn't written anywhere.

β οΈ Known caveat: Maestro splits the script on
;, so every statement must end with;β and a;embedded inside a statement (e.g. in a string literal) can break the split.
Ssh¶
Type SSH Β· WPF Steps/Common/SSH.cs β SSHWindow.xaml ("Execute Shell Command (SSH)") Β·
Runs a shell script on an SSH connection β you can invoke bash, Python, Ruby, Perl, etc. (Maestro writes the script to a temporary .sh file and runs it with bash).
- Shell Script tab β the SSH Connection and the script.
- Exit Status tab β how the step decides success:
- Use script return value β exit code 0 = success, non-zero = failure;
- Scan normal value in script output β success when the output contains the Scan Following Text (e.g. COMPLETED SUCCESSFULLY);
- Scan abnormal value in script output β failure when the output contains the text (e.g. CRITICAL ERROR).


Job Step¶
Type JOBSTEP Β· WPF Steps/Common/JobStep.cs β JobStepWindow.xaml ("Run Workflow (Job)") Β·
Runs another workflow (job) as a step β the basis for orchestration, repetition, and recovery. Pick a job from Available Jobs; one Job Step runs one job, and that job can itself contain Job Steps, so jobs nest to any depth (parent β children β grandchildren).
No loops. Maestro forbids recursive job graphs β a child (or any descendant) cannot point a Job Step back at an ancestor.
Automatic Recovery Options β the ACTION controls how the sub-job runs:
| Action | Behavior | Requires | Terminates when |
|---|---|---|---|
| DO_NOTHING | Runs once in default mode and returns (no recovery). | β | automatically |
| RESTART | Re-runs until the expected outcome is reached. | UNTIL state, Max Run Count, Sleep Interval | parent terminated manually Β· or max run count reached Β· or last run state matches the UNTIL value (e.g. COMPLETE) |
| RUN_LIMIT_LOOP | Loops a fixed number of times. | Max Run Count, Sleep Interval | parent terminated manually Β· or max run count reached |
| RUN_FOREVER | Runs perpetually. | Sleep Interval | parent terminated manually |

Export¶
| β¬ | Step | Type | What it does |
|---|---|---|---|
| π’ | Write Jdbc | WRITEJDBC |
Exports the input rows to an existing table on another JDBC database (does not create the table β use the DDL Template to build it first). 1 input. |
Write Jdbc¶
Type WRITEJDBC Β· WPF Steps/CommonTransform/WriteJdbc.cs β WriteJdbcWindow.xaml ("Write to JDBC Target") Β· 1 input
Exports the incoming rows into another database over JDBC. It needs an upstream data source (a Table step or the output of any transform) and writes to an existing target table.
The target table must already exist β Write Jdbc does not create it. Click DDL Template to preview a
CREATE TABLEfor the incoming columns, create that table manually on the target connection (adjusting types per the template's notes β e.g.timestampβdatetimefor SQL Server), then Browse to select it.
- Target Load Options β Records per batch commit, Print verbose message on batch count, Attribute enclosed by (quote char), and the DDL Template preview button.
- Target Table (Existing) β Connection, Catalog, Schema, Table (via Browse), and Truncate Target Table.


β οΈ Tested targets: export is validated against SQL Server, Postgres, and Oracle targets; other JDBC targets are not yet tested.
MPP-common steps¶
On the in-scope MPP platforms (Redshift, Snowflake, ClickHouse) but not SparkSQL.
| β¬ | Step | Type | What it does | Notes |
|---|---|---|---|---|
| π’ | Table | TABLE |
Reads from / writes to a referenced table. Construct: Existing Β· Create Β· Temp. Load Options: Truncate Β· Run Statistics (GENERATE STATISTICS) Β· Run Vacuum (GROOM) Β· Upsert (update+insert by Key). Column flags: Cluster/Dist/Sort Β· Key (needed for Upsert) Β· Ident. |
Add/remove columns only for Create/Temp |
| π’ | View | VIEW |
References an existing database view (database / schema / view) and brings its columns into the pipeline as a read source β no input; output columns = the view's columns. | |
| π’ | Sql Metrics | SQLMETRICS |
Runs a selected set of data-quality / rowcount metrics (from the metrics catalog β metrics.dq_definition) against the source and/or target. Pick the metric list and toggle Run Source / Run Target. |
DQ / observability |
| π’ | Write File | WRITEFILE |
Exports a SQL query result (on the platform connection) to delimited flat files on the server β set the SQL, output directory, file prefix / extension (default .psv), delimiter (\|), per-file size MB (split, default 128), gzip, auto-clean, sanitize, and fetch size. |
Export |
| π’ | SCD1 | SCD1 |
Slowly Changing Dimension Type-1 β upserts into a target dimension, overwriting changed attributes in place (no history, unlike SCD2). Set target catalog / schema / table and mark each column Match / Change / Ignore. 1 input. |
All MPP except ClickHouse |
| π’ | SCD2 | SCD2 |
Slowly Changing Dimension Type-2 β keeps history via effective/expiration timestamps (current row's exp_ts = 9999-12-31). Mark each column M (match/identity) Β· C (track change) Β· X (ignore), and designate the dim-id / eff-ts / exp-ts columns. On a change it expires the old row and inserts a new current row. |
All MPP (ClickHouse included) |
Platform-specific steps¶
Each platform has its own load target (JdbcTarget<Platform>) and, in some cases, a staging/file step. Same pattern, per-platform implementation.
Redshift¶
| β¬ | Step | Type | What it does |
|---|---|---|---|
| π’ | Jdbc Target Redshift | JDBCTARGETREDSHIFT |
Corelli loader β loads a Jdbcsource's staged files into Redshift via S3 + COPY; incremental delta-watermark. |
| π’ | Redshift Stage | REDSHIFTSTAGE |
Stages a Local File's parquet output to S3, creates a temp Redshift table, bulk-loads the staged files, then cleans them up. Used in the Local File β Redshift Stage β Table pipeline; pick the S3 Connection + IAM or KEY auth. |
| π’ | Local File | LOCAL_FILE |
Reads flat files from the local filesystem into the pipeline (a file source). |
| π’ | Dataframe | DATAFRAME |
Reads HDFS file(s)/folder into a Spark data frame + a temp global view (Parquet/Orc/Csv/Json/Text); reader or writer per the arrow direction. Write Option: truncate / append / upsert-append. Output Mode: All Rows Β· Delta Passthrough (rows from this run) Β· Last Inserted Version (needs key columns). |
Jdbc Target Redshift¶
Type JDBCTARGETREDSHIFT Β· WPF Steps/Pipeline/JdbcTargetRedShift.cs β JdbcTargetRedShiftWindow.xaml ("Redshift Target β Corelli Parallel Loader") Β· requires an upstream Jdbcsource
The target half of the Corelli pipeline: it loads the flat files staged by a Jdbcsource into Redshift. It only works downstream of a Jdbcsource (not other step types).

Target Table tab β connection, catalog/db, schema, table. Check Table Exists if the target already exists; leave it unchecked to have Maestro create it. Truncate and Loader Threads as needed.

Source Column(s) tab β click Refresh to pull the incoming columns; use the pencil icon to correct any data types that look wrong.

Mapping Columns(s) tab β click Reset to auto-map source β target columns, then adjust as needed.
Load Options tab β the staging commands (under $OBJECTSTORAGE): S3 Upload (aws s3 cp to S3), S3 Clean (aws s3 rm --recursive after load), and Copy/Load (Redshift COPY β¦ iam_role β¦ gzip). Maestro runs the AWS CLI to push the files to S3, runs COPY, then removes the staged files on success. Don't change the $VARIABLE placeholders.

Delta Watermark tab β a query that runs on the target (SELECT MAX($COLUMN) FROM $TARGET). When a watermark/Delta column is set on the Jdbcsource and the job runs on a schedule (e.g. hourly), this MAX value becomes the source's $EXTRACT_CONDITION filter, so data loads incrementally.

Snowflake¶
| β¬ | Step | Type | What it does |
|---|---|---|---|
| π’ | File Loader | FILELOADER |
Loads flat files into an existing table whose structure matches the file (output β a Table step). Source files come from an Onstage Group step, SFTP, or the ELT Maestro server; external-table load options are customizable. |
| π’ | Jdbc Target Snowflake | JDBCTARGETSNOWFLAKE |
Corelli loader β the Snowflake target half of a Jdbcsource β JdbcTargetSnowflake pipeline (requires an upstream Jdbcsource); same shape as the other JdbcTarget* loaders. Loads the source's staged files into Snowflake via COPY INTO $TARGET FROM $STAGEPATH with the CORELLI_FORMAT_PIPE0 file format. Tabs: Target Table (connection / catalog / schema / table Β· Truncate Β· Loader Threads) Β· Source Column(s) Β· Mapping Column(s) Β· Load Options (the COPY command) Β· Delta Watermark (SELECT MAX($COLUMN) FROM $TARGET, incremental). |
ClickHouse¶
| β¬ | Step | Type | What it does |
|---|---|---|---|
| π’ | Jdbc Target Clickhouse | JDBCTARGETCLICKHOUSE |
Corelli loader β the ClickHouse target half of a Jdbcsource β JdbcTargetClickhouse pipeline (requires an upstream Jdbcsource); same shape as the other JdbcTarget* loaders. Difference: it loads files directly to the ClickHouse server via clickhouse-client (β¦ INSERT INTO $TARGET FORMAT CSV < $LOCALFILE) rather than staging through S3 β so the ELT Maestro host must have clickhouse-client installed and the load command configured (the $MPP_HOST / $DELIMITER / $TARGET placeholders come from the integrator's system.cfg; see CLICKHOUSE-AUTH.md). Tabs: Target Table (connection / catalog / schema / table Β· Truncate Β· Loader Threads) Β· Source Column(s) Β· Mapping Column(s) Β· Load Options Β· Delta Watermark (SELECT MAX($COLUMN) FROM $TARGET, incremental). |
| β¬ | Write Incident | WRITEINCIDENT |
TBD |
| π’ | Dataframe | DATAFRAME |
Reads HDFS file(s)/folder into a Spark data frame + a temp global view (Parquet/Orc/Csv/Json/Text); reader or writer per the arrow direction. Write Option: truncate / append / upsert-append. Output Mode: All Rows Β· Delta Passthrough (rows from this run) Β· Last Inserted Version (needs key columns). |
SparkSQL¶
| β¬ | Step | Type | What it does |
|---|---|---|---|
| π’ | Dataframe | DATAFRAME |
Reads HDFS file(s)/folder into a Spark data frame + a temp global view (Parquet/Orc/Csv/Json/Text); reader or writer per the arrow direction. Write Option: truncate / append / upsert-append. Output Mode: All Rows Β· Delta Passthrough (rows from this run) Β· Last Inserted Version (needs key columns). |
| π’ | Local File | LOCAL_FILE |
Reads flat files from the local filesystem into the pipeline (a file source). |
| π’ | Jdbc Target HDFS | JDBCTARGETHDFS |
Corelli HDFS Spark DataFrame loader β the SparkSQL target half of a Jdbcsource β JdbcTargetHDFS pipeline (requires an upstream Jdbcsource). Writes the source's rows to an HDFS Target Path as snappy-compressed Parquet. Loader Threads Β· Truncate Β· Source Column(s) refresh (typed for SparkSQL) Β· Mapping Column(s) auto-map (Reset / Check). |
| π’ | Spark Data Cache | SPARKDATACACHE |
Caches a data frame to disk/memory β a Spark persist (same output as input). Storage levels: DISK_ONLY, MEMORY_AND_DISK, MEMORY_ONLY(_SER/_2), β¦ |
| π’ | Aggregate | AGGREGATE |
Performs aggregations β pick Group column(s) and f(Agg) column(s) (sum/avg/β¦); the UI suggests output names/types. (Spark's classic aggregate; the expression-builder equivalent is Aggregate2.) |
| π’ | Onstage Delta | ONSTAGEDELTA |
Continuous loop delta loader (SparkSQL-only) β runs in a perpetual loop, polling a source Table/View every Delay (Seconds) (default 60) and merging only rows past a watermark (WM Col/Expr) into the target, starting from a configurable StartWM Value. Source + Target each take connection / database / schema / table + watermark expression (DummySQL = a placeholder query); Mapping Column(s) maps sourceβtarget. |
| π’ | Export | EXPORT |
Exports the incoming Spark data frame to a local destination folder on the ELT Maestro server in a chosen File Format (default Parquet). 1 input; the column list lets you mark Key column(s). |
| π’ | ML Feature Trainer | MLFEATURETRAINER |
Trains one or more Spark MLlib models on the input data frame β pick ID, Feature, and Label column(s), add the model(s) to train, and name the Model Topic. 1 input. (Step label: "Train <topic> with N model(s)".) |
| π’ | ML Feature Predictor | MLFEATUREPREDICTOR |
Scores the input data frame with previously-trained Spark MLlib models β pick the Topic Path, the Model(s), Feature + ID column(s), and output file parts (default 4). Emits the ID columns plus features, rawPrediction, probability, topicDistribution, prediction, ml_model_name, ml_timestamp. |
| π’ | ML Feature Engineer | MLFEATUREENGINEER |
Builds engineered feature columns via a list of actions β each applies a Spark MLlib transform (action type) to an input column, producing a new output column. Output = input columns + the new engineered columns. |
| π’ | ML Poly Feature Engineer | MLPOLYFEATUREENGINEER |
Like ML Feature Engineer, but applies a single Spark MLlib feature-engineering rule (transName/transType) across multiple input columns at once, producing the corresponding output column(s) appended to the input data frame. |
Tie-back to architecture¶
The steps above aren't an undifferentiated toolbox β each one belongs to a stage of a Bronze β Silver β Gold medallion pipeline. This is the step β layer view (the reverse of MEDALLION.md, which goes layer β step); use it to decide where a step belongs when you build a workflow. Scoped to the documented platforms (Redshift, Snowflake, ClickHouse, SparkSQL).
π₯ Bronze β raw landing / ingestion¶
Capture faithfully, transform minimally, append-only.
| Role | Steps |
|---|---|
| Pull / extract | Jdbcsource, Datasource, Local File, Sftp, Pull File |
| Land to the lake / target | Jdbc Target HDFS (Parquet to HDFS), Onstage Group (multi-table β parquet/external tables), Onstage Delta (continuous-loop landing) |
| Arrival triggers | File Watch, JDBC Watch, Script Watch, File Scanner, Remote File Scanner |
| Incremental capture | Watermark (+ the isCDC delta primitive), Set Variable |
π₯ Silver β cleaned, conformed, validated¶
Idempotent and re-runnable; gate promotion on data quality.
| Role | Steps |
|---|---|
| Clean / shape | Filter, Dedupe, Function2 (derivations), Datamask (PII), View, Pivot |
| Conform / integrate | Join, Union, Minus, Spark Data Cache |
| Historize dimensions | SCD1, SCD2 |
| Quality gate (BronzeβSilver) | Control Test, Sql Metrics, Meta Profile, Write Incident (quarantine bad rows) |
π₯ Gold β business marts / serving¶
Business-grain, serving-optimized.
| Role | Steps |
|---|---|
| Aggregate / model | Aggregate, Aggregate2, Table (+ Onstage Group loaders) |
| Publish to serving MPP | Jdbc Target Redshift / Snowflake / Clickhouse / HDFS, Redshift Stage, File Loader, Write Jdbc, Write File, Export |
| ML feature / serving | ML Feature Engineer, ML Poly Feature Engineer, ML Feature Trainer, ML Feature Predictor |
The orchestration spine (cross-layer)¶
These don't belong to one layer β they wire the layers together into one end-to-end pipeline: Job Step (chain BronzeβSilverβGold workflows), Switch / Set Variable / Sync (conditional flow + barriers), SQL Script / SSH / Smart Script (custom logic + maintenance), and Push File / Pull File / Sftp (movement between systems). See the orchestration spine and the Silver quality gate for how scheduling, alerting, and DQ gating stitch a tiered pipeline together.