Skip to content

MPP integrator setup + system.cfg adoption

How to stand up an MPP target (Redshift / Synapse / Snowflake / …) end to end, plus the polished system.cfg standard ($OBJECT_STORAGE etc.) and where the engine / WPF read each variable.

Status (2026-05-30): shipped. The $OBJECT_STORAGE standard is live — engine + WPF read $OBJECT_STORAGE with a $CORELLI_CONNECTION fallback (code e011b0611, hotswapped via build #294/#295), and all four MPP integrators' system.cfg have migrated (redshift 61fbc7805, sparksql 68251f540, snowflake+synapse f940cc04a). No integrator defines $CORELLI_CONNECTION anymore, so the fallback is now defensive-only (custom / un-migrated deploys). Connection-name standard: S3_CONNECTION for the S3 integrators (redshift / sparksql / snowflake) and BLOB_CONNECTION for synapse (Azure Blob). §2–§3 below document the diff and where each variable is consumed; §4 is the (now-complete) rollout checklist for reference.

1. MPP prerequisites (every MPP integrator)

A working MPP integrator needs all of:

  • A JDBC service account (example: rsadmin) with:
  • Read + write to the default object-storage bucket (S3 for Redshift, Blob for Synapse) — the staging area for load and unload files.
  • A schema where temp tables can be dynamically created and dropped (example: integrator). The engine creates/drops TEMP_TABLE objects here every run.
  • Read + write on the other schema objects it touches while ingesting, transforming, and building data marts (medallion: bronze → silver → gold).
  • WPF connections (Administration & Setup):
  • A JDBC connection using the platform's MPP_Driver_* jar (why) and connection string, authenticated as the service account (rsadmin).
  • A cloud connection (setup) pointing at the default bucket, named to match the integrator's $OBJECT_STORAGE: S3_CONNECTION for the AWS-S3 integrators (Redshift / SparkSQL / Snowflake) and BLOB_CONNECTION for Synapse (Azure Blob). Firebolt is out of scope.
  • On the Maestro engine server: the AWS CLI must be installed and configured — run aws configure for the OS user the engine runs as (access key / secret / default region), then verify with aws s3 ls. The loader stages to object storage with aws s3 cp / aws s3 rm shell commands that run on this server, so they fail if the CLI isn't configured. Walk-through with screenshots: REDSHIFT-SETUP.md.
  • base64 decode is native on Redshift — $B64EXPRESSION=FROM_VARBYTE(TO_VARBYTE($COLUMN,'base64'),'utf-8')::$DATATYPE, so there is no Python UDF to install (REDSHIFT-SETUP §3).

2. Polished system.cfg — the diff

redshift/system.cfg (legacy → shipped polished). The other integrators took the same $OBJECT_STORAGE change (synapse uses BLOB_CONNECTION):

Variable Legacy Shipped Kind
$SYSTEM_DEFAULT_DATABASE odyssey-dev dev data
$SYSTEM_DEFAULT_SCHEMA maestro-integrator integrator data
$B64EXPRESSION "maestro-integrator".b64_decode(...) FROM_VARBYTE(TO_VARBYTE(…,'base64'),'utf-8')::$DATATYPE (native) data
cloud connection var $S3_CONNECTION=CORELLI_S3_CONNECTION + $CORELLI_CONNECTION=CORELLI_S3_CONNECTION $OBJECT_STORAGE=S3_CONNECTION rename (code shipped)
$AWS_CREDENTIALS_FILE_OUTPUT_COMMAND cat ~/.aws/credentials removed removal (code shipped)
$SYSTEM_NAME INTEGRATOR REDSHIFT cosmetic

3. Where each variable is consumed (post-migration)

A. Cloud-connection variable → $OBJECT_STORAGEengine + WPF (shipped)

The S3/Blob connection is now resolved through BaseStep.getObjectStorageConnectionName() (root-engine-core/.../core/parts/BaseStep.java:678), which reads $OBJECT_STORAGE and falls back to $CORELLI_CONNECTION when an integrator hasn't migrated. getVarData passes an undefined variable through unchanged, so an unresolved $OBJECT_STORAGE still starts with $ — that's the fallback sentinel.

  • engine-coregetObjectStorageConnectionName() at the active sites:
  • steps/pipeline/core/JdbcTargetRedShift.java:135
  • parallel/steps/ParallelJdbcProcessorNode.java:249, 532, 644
  • WPF — same $OBJECT_STORAGE-then-$CORELLI_CONNECTION fallback in Steps/Spark/SparkGroupLoader.cs and the Steps/Pipeline/JdbcTargetRedShiftWindow.xaml header label.
  • Cross-integrator: because every consumer falls back, an integrator migrates by editing only its own system.cfg — no code change. All four are already migrated.

B. $AWS_CREDENTIALS_FILE_OUTPUT_COMMAND removed — WPF (shipped)

Steps/Redshift/RSFileLoaderWindow.xaml.cs previously aborted the loader dialog when this variable was empty. It now reads it best-effort inside a try/catch and no longer aborts (:361-364): if the variable is absent the credentials box just stays empty and the dialog remains usable. The engine stages to object storage via the host's configured AWS CLI / IAM role instead of parsing ~/.aws/credentials. (engine core/parts/Job.java:226 reference was already commented out.)

C. Schema maestro-integratorintegrator, db odyssey-devdevdata only

Data-driven through $SYSTEM_DEFAULT_SCHEMA/$SYSTEM_DEFAULT_DATABASE (feed the get_* introspection templates' $SCHEMA/$DATABASE and temp-table creation). Operational only: the integrator schema must exist with create/drop temp-table rights.

D. $B64EXPRESSION → native VARBYTE decode — no install asset

Redshift decodes base64 natively (FROM_VARBYTE(TO_VARBYTE($COLUMN,'base64'),'utf-8')::$DATATYPE) — no Python UDF, so there's nothing to install on the cluster. See REDSHIFT-SETUP §3.

E. $SYSTEM_NAME INTEGRATOR→REDSHIFT — safe

Defined in every integrator's system.cfg but read by no engine / service / WPF code (grep clean). Cosmetic. (After a destructive redeploy, the live server is reseeded from the image, so its system.cfgs match the repo — no drift.)

F. WPF cloud connection name — S3_CONNECTION / BLOB_CONNECTION

$OBJECT_STORAGE=S3_CONNECTION and RedshiftStage.{java,cs} default connection = "S3_CONNECTION", so the AWS-S3 General/Cloud connection must be named exactly S3_CONNECTION. Synapse stages to Azure Blob, so its connection must be named BLOB_CONNECTION ($OBJECT_STORAGE=BLOB_CONNECTION).

Not affected

  • Meta-service (root-engine-service) — zero references to any of the renamed/removed variables.

4. Rollout checklist — complete

  • [x] engine-core: $OBJECT_STORAGE resolver BaseStep.getObjectStorageConnectionName() with $CORELLI_CONNECTION fallback; all call sites updated (e011b0611).
  • [x] WPF: same fallback in SparkGroupLoader.cs, JdbcTargetRedShiftWindow.xaml; RSFileLoaderWindow no longer hard-requires $AWS_CREDENTIALS_FILE_OUTPUT_COMMAND.
  • [x] install: $OBJECT_STORAGE rolled into all four integrators' system.cfg (61fbc7805 / 68251f540 / f940cc04a). (base64 decode is native — no UDF asset.)
  • [x] build/deploy: engine-core jar hotswapped (#294/#295); WPF rebuilt via build-portable. Meta-service unaffected.
  • [ ] operator (per environment): create the S3_CONNECTION (or BLOB_CONNECTION) cloud connection, the integrator schema (create/drop temp-table rights), and push the live system.cfg via the WPF Integrator Config admin tab. (The live redshift cfg push is still operator-driven — system.cfg has no hotswap path.)