MPP integrator setup + system.cfg adoption¶
How to stand up an MPP target (Redshift / Synapse / Snowflake / …) end to end, plus the polished system.cfg standard ($OBJECT_STORAGE etc.) and where the engine / WPF read each variable.
Status (2026-05-30): shipped. The
$OBJECT_STORAGEstandard is live — engine + WPF read$OBJECT_STORAGEwith a$CORELLI_CONNECTIONfallback (codee011b0611, hotswapped via build #294/#295), and all four MPP integrators'system.cfghave migrated (redshift61fbc7805,sparksql68251f540,snowflake+synapsef940cc04a). No integrator defines$CORELLI_CONNECTIONanymore, so the fallback is now defensive-only (custom / un-migrated deploys). Connection-name standard:S3_CONNECTIONfor the S3 integrators (redshift / sparksql / snowflake) andBLOB_CONNECTIONfor synapse (Azure Blob). §2–§3 below document the diff and where each variable is consumed; §4 is the (now-complete) rollout checklist for reference.
1. MPP prerequisites (every MPP integrator)¶
A working MPP integrator needs all of:
- A JDBC service account (example:
rsadmin) with: - Read + write to the default object-storage bucket (S3 for Redshift, Blob for Synapse) — the staging area for load and unload files.
- A schema where temp tables can be dynamically created and dropped (example:
integrator). The engine creates/dropsTEMP_TABLEobjects here every run. - Read + write on the other schema objects it touches while ingesting, transforming, and building data marts (medallion: bronze → silver → gold).
- WPF connections (Administration & Setup):
- A JDBC connection using the platform's
MPP_Driver_*jar (why) and connection string, authenticated as the service account (rsadmin). - A cloud connection (setup) pointing at the default bucket, named to match the integrator's
$OBJECT_STORAGE:S3_CONNECTIONfor the AWS-S3 integrators (Redshift / SparkSQL / Snowflake) andBLOB_CONNECTIONfor Synapse (Azure Blob). Firebolt is out of scope. - On the Maestro engine server: the AWS CLI must be installed and configured — run
aws configurefor the OS user the engine runs as (access key / secret / default region), then verify withaws s3 ls. The loader stages to object storage withaws s3 cp/aws s3 rmshell commands that run on this server, so they fail if the CLI isn't configured. Walk-through with screenshots: REDSHIFT-SETUP.md. - base64 decode is native on Redshift —
$B64EXPRESSION=FROM_VARBYTE(TO_VARBYTE($COLUMN,'base64'),'utf-8')::$DATATYPE, so there is no Python UDF to install (REDSHIFT-SETUP §3).
2. Polished system.cfg — the diff¶
redshift/system.cfg (legacy → shipped polished). The other integrators took the same $OBJECT_STORAGE change (synapse uses BLOB_CONNECTION):
| Variable | Legacy | Shipped | Kind |
|---|---|---|---|
$SYSTEM_DEFAULT_DATABASE |
odyssey-dev |
dev |
data |
$SYSTEM_DEFAULT_SCHEMA |
maestro-integrator |
integrator |
data |
$B64EXPRESSION |
"maestro-integrator".b64_decode(...) |
FROM_VARBYTE(TO_VARBYTE(…,'base64'),'utf-8')::$DATATYPE (native) |
data |
| cloud connection var | $S3_CONNECTION=CORELLI_S3_CONNECTION + $CORELLI_CONNECTION=CORELLI_S3_CONNECTION |
$OBJECT_STORAGE=S3_CONNECTION |
rename (code shipped) |
$AWS_CREDENTIALS_FILE_OUTPUT_COMMAND |
cat ~/.aws/credentials |
removed | removal (code shipped) |
$SYSTEM_NAME |
INTEGRATOR |
REDSHIFT |
cosmetic |
3. Where each variable is consumed (post-migration)¶
A. Cloud-connection variable → $OBJECT_STORAGE — engine + WPF (shipped)¶
The S3/Blob connection is now resolved through BaseStep.getObjectStorageConnectionName() (root-engine-core/.../core/parts/BaseStep.java:678), which reads $OBJECT_STORAGE and falls back to $CORELLI_CONNECTION when an integrator hasn't migrated. getVarData passes an undefined variable through unchanged, so an unresolved $OBJECT_STORAGE still starts with $ — that's the fallback sentinel.
- engine-core —
getObjectStorageConnectionName()at the active sites: steps/pipeline/core/JdbcTargetRedShift.java:135parallel/steps/ParallelJdbcProcessorNode.java:249, 532, 644- WPF — same
$OBJECT_STORAGE-then-$CORELLI_CONNECTIONfallback inSteps/Spark/SparkGroupLoader.csand theSteps/Pipeline/JdbcTargetRedShiftWindow.xamlheader label. - Cross-integrator: because every consumer falls back, an integrator migrates by editing only its own
system.cfg— no code change. All four are already migrated.
B. $AWS_CREDENTIALS_FILE_OUTPUT_COMMAND removed — WPF (shipped)¶
Steps/Redshift/RSFileLoaderWindow.xaml.cs previously aborted the loader dialog when this variable was empty. It now reads it best-effort inside a try/catch and no longer aborts (:361-364): if the variable is absent the credentials box just stays empty and the dialog remains usable. The engine stages to object storage via the host's configured AWS CLI / IAM role instead of parsing ~/.aws/credentials. (engine core/parts/Job.java:226 reference was already commented out.)
C. Schema maestro-integrator→integrator, db odyssey-dev→dev — data only¶
Data-driven through $SYSTEM_DEFAULT_SCHEMA/$SYSTEM_DEFAULT_DATABASE (feed the get_* introspection templates' $SCHEMA/$DATABASE and temp-table creation). Operational only: the integrator schema must exist with create/drop temp-table rights.
D. $B64EXPRESSION → native VARBYTE decode — no install asset¶
Redshift decodes base64 natively (FROM_VARBYTE(TO_VARBYTE($COLUMN,'base64'),'utf-8')::$DATATYPE) — no Python UDF, so there's nothing to install on the cluster. See REDSHIFT-SETUP §3.
E. $SYSTEM_NAME INTEGRATOR→REDSHIFT — safe¶
Defined in every integrator's system.cfg but read by no engine / service / WPF code (grep clean). Cosmetic. (After a destructive redeploy, the live server is reseeded from the image, so its system.cfgs match the repo — no drift.)
F. WPF cloud connection name — S3_CONNECTION / BLOB_CONNECTION¶
$OBJECT_STORAGE=S3_CONNECTION and RedshiftStage.{java,cs} default connection = "S3_CONNECTION", so the AWS-S3 General/Cloud connection must be named exactly S3_CONNECTION. Synapse stages to Azure Blob, so its connection must be named BLOB_CONNECTION ($OBJECT_STORAGE=BLOB_CONNECTION).
Not affected¶
- Meta-service (
root-engine-service) — zero references to any of the renamed/removed variables.
4. Rollout checklist — complete¶
- [x] engine-core:
$OBJECT_STORAGEresolverBaseStep.getObjectStorageConnectionName()with$CORELLI_CONNECTIONfallback; all call sites updated (e011b0611). - [x] WPF: same fallback in
SparkGroupLoader.cs,JdbcTargetRedShiftWindow.xaml;RSFileLoaderWindowno longer hard-requires$AWS_CREDENTIALS_FILE_OUTPUT_COMMAND. - [x] install:
$OBJECT_STORAGErolled into all four integrators'system.cfg(61fbc7805/68251f540/f940cc04a). (base64 decode is native — no UDF asset.) - [x] build/deploy: engine-core jar hotswapped (#294/#295); WPF rebuilt via
build-portable. Meta-service unaffected. - [ ] operator (per environment): create the
S3_CONNECTION(orBLOB_CONNECTION) cloud connection, theintegratorschema (create/drop temp-table rights), and push the livesystem.cfgvia the WPF Integrator Config admin tab. (The live redshift cfg push is still operator-driven —system.cfghas no hotswap path.)