Skip to content

Administration & Setup

Part 2 of the ELTMaestro User Guide. This is the administrator's first-time setup / initialization guide. After signing in, the first thing to do is create your connections — the databases, SSH hosts, and cloud/object-storage endpoints your workflows read from and write to. Everything you create here becomes a picker in the Job editor later.

Opening the Admin Interface

Administration ▸ Configure Maestro Server (on the main window's menu bar). The menu item is enabled only for admin-role users — if it's greyed out, your account isn't an administrator.

It opens a tabbed window:

Tab Purpose
System Server-level system settings (read-only grid + Edit).
Database Connections JDBC connections — see below.
SSH Connections SSH/SFTP hosts — see below.
General/Cloud Config Cloud / object-storage (S3, …) connections — see below.
Users · User-Permissions Accounts and access (covered later in this guide).
Maestro Machine Learning Feature engineering + model config.
Integrator Config Live per-integrator config editor.
Vendor tabs (Netezza, …) Platform-specific settings.

Connections — create these first

Three connection types, one tab each. All follow the same Create / Edit / Delete pattern; once saved, a connection is selectable by name in the Job editor.

Database connections (JDBC)

Tab: Database Connections. Toolbar: Create, Edit, Duplicate, Delete.

Click Create to open the JDBC connection editor:

Field Notes
Connection name Your label for the connection.
Driver class Pick the platform's JDBC driver class — e.g. com.amazon.redshift.Driver, net.snowflake.client.jdbc.SnowflakeDriver, org.postgresql.Driver, cc.blynk.clickhouse.ClickHouseDriver, com.firebolt.FireboltDriver, plus Oracle / SQL Server / MySQL / MariaDB / DB2 / Hive / Spark.
Driver jar The installed driver jar to use. For MPP / integrator platforms you must pick the MPP_Driver_* jar — see the warning below.
Connection string A JDBC URL template with $HOST / $PORT / $DATABASE placeholders — e.g. jdbc:redshift://$HOST:$PORT/$DATABASE (a TLS variant is offered for Redshift).
User / Password DB credentials (password stored encrypted).

⚠️ Pick the MPP_ driver jar for MPP / integrator platforms

For an MPP target — Redshift, ClickHouse, Synapse, Snowflake, Exasol, Firebolt, Netezza, Greenplum, Databricks, DashDB, Yellowbrick, XtremeData — you must select the matching MPP_Driver_<Platform>_*.jar, not the plain base driver jar. The MPP_ jar is what tells the engine to treat the connection as an integrator/MPP connection (it wires the default database/schema from the integrator's system.cfg). If you pick the plain jar instead, the connection can browse metadata but you cannot create a job against it — only MPP_Driver_* connections are valid job targets.

Platform Driver jar to select
Redshift MPP_Driver_Redshift_RSJDBC.jar
ClickHouse MPP_Driver_Clickhouse_JDBC.jar
Synapse MPP_Driver_Synapse_JDBC.jar
Snowflake MPP_Driver_SnowFlake_JDBC.jar
Exasol MPP_Driver_Exasol_JDBC.jar
Firebolt MPP_Driver_Firebolt_JDBC.jar
Netezza MPP_Driver_Netezza_NZJDBC.jar
Greenplum MPP_Driver_Greenplum_PGJDBC.jar
DashDB (DB2) MPP_Driver_DashDB_DB2JDBC.jar
Databricks MPP_Driver_Databricks.jar
Yellowbrick MPP_Driver_Yellowbrick_PGJDBC.jar
XtremeData MPP_Driver_XtremeData_PGJDBC.jar

The connection's name is embedded in each job's XML — jobs reference the connection by name, so renaming or deleting a connection that jobs use will break those jobs.

Saved to the audit DB (t_jdbc).

  • Duplicate copies the selected connection (handy for dev/prod variants).

Platform-specific setup (drivers, quirks, prerequisites) lives in the per-platform guides — e.g. REDSHIFT-SETUP.md, CLICKHOUSE-AUTH.md.

SSH connections

Tab: SSH Connections. Toolbar: Create, Edit, Delete. Used for SFTP file sources/landing and remote shell commands.

Create opens the SSH editor:

Field Notes
Connection name Label (normalized to a proper name).
Host / Port SSH host and port.
User / Password Login user; password stored encrypted.
Use PEM key Authenticate with a key file instead of a password.
Use SFTP Use SFTP for file transfer (vs plain SSH).
PEM path Path to the key file when Use PEM key is on.

(The key/SFTP choices are stored as a compact key|raw : sftp|ssh : <pem-path> setting.)

General / Cloud connections (S3)

Tab: General/Cloud Config. Toolbar: Create, Edit, Delete. This is where object-storage / cloud endpoints live — e.g. the S3 connection the Redshift file loader stages to.

Create opens the Cloud connection editor:

  1. Connection name (normalized to upper case).
  2. Connection type — pick the storage/provider type: Aws S3, Azure Blob, HDFS, LocalFS, or JDBC. The type drives the parameter rows below.
  3. Parameters — fill the type-specific name/value rows.

Setting up an Aws S3 connection

Go to Administration ▸ General/Cloud Config ▸ Create, then:

  1. Configuration Name — must be S3_CONNECTION (see the name note below).
  2. Configuration TypeAws S3.
  3. Fill the parameters and Save:
Parameter Notes
bucket.name The S3 bucket the loader stages files to / reads from.
access.key AWS access key ID.
secret.key AWS secret access key.
aws.region Bucket region (default us-east-1).
iam.role IAM role ARN used for Redshift COPY / Spectrum, e.g. arn:aws:iam::<acct>:role/<role>.
dir.data Server-side staging directory (default /tmp/maestro.data).

S3 connection editor — Configuration Name S3_CONNECTION, Type Aws S3, with bucket.name/access.key/secret.key/dir.data/aws.region/iam.role parameters

The saved connection then appears in the General/Cloud Config list (its values are stored base64-encoded):

Administration General/Cloud Config list showing the S3_CONNECTION entry of type AWS S3

Saved to the audit DB (t_connection_general); the parameters become $cloud.S3_CONNECTION.<param> variables usable in jobs.

Name it exactly S3_CONNECTION. The integrator's system.cfg $OBJECT_STORAGE variable must resolve to this connection's name; the repo ships $OBJECT_STORAGE=S3_CONNECTION (redshift + sparksql + snowflake). Synapse stages to Azure Blob, so name its connection BLOB_CONNECTION to match $OBJECT_STORAGE=BLOB_CONNECTION. The Redshift file-loader COPY stages through this connection.

Credentials for a Redshift load: the S3 connection above carries access.key / secret.key / iam.role; the COPY defaults to the connection's IAM role (iam_role '$iam.role'), with access-key/secret as the fallback. The engine host also needs the AWS CLI installed and configured (aws configure), since the loader runs aws s3 cp/rm to stage files. See REDSHIFT-SETUP.md.

(Azure Blob, LocalFS, and JDBC types expose their own parameter sets — e.g. Azure Blob: account.name / account.key / container.name / sas.token.)

The Spark / HDFS connection (ELTM_SPARK_DRIVER)

SparkSQL / Spark jobs require an HDFS cloud connection named exactly ELTM_SPARK_DRIVER — it must be present before you can build or run a Spark job. (A SparkSQL job's Platform Connection dropdown is populated from HDFS connections, and ELTM_SPARK_DRIVER is the name the Spark service itself runs under — see admin.commands/mstart-spark-service.)

Create it via Administration ▸ General/Cloud Config ▸ Create, Configuration Type HDFS:

Parameter Notes
fs.prefix HDFS namenode URI, e.g. hdfs://localhost:9000/.
file.hdfs.site.xml.path Actual path to hdfs-site.xml, e.g. /opt/iserver/tools/hadoop/etc/hadoop/hdfs-site.xml.
file.core.site.xml.path Actual path to core-site.xml.
dir.data Staging directory (default /tmp/maestro.data).
thrift.hive2.db / thrift.hive2.schema / thrift.hive2.connection.name Hive (thrift) catalog target — e.g. dwh / iext / HIVE_EMBEDDED. All three are overridable per-integrator via the system.cfg variables $HIVE_DATABASE / $HIVE_SCHEMA / $HIVE_CONNECTION (see note below); when a variable is unset the corresponding arg here is the fallback.
fs.s3a.access.key / fs.s3a.secret.key / fs.s3a.endpoint S3A credentials + endpoint (s3.amazonaws.com) for Spark to read/write S3.

$HIVE_CONNECTION / $HIVE_DATABASE / $HIVE_SCHEMA (system.cfg overrides). SparkSQL onstage jobs register their staged tables in a Hive/Thrift external catalog. The engine resolves the catalog connection, database, and schema from these system.cfg variables first, each falling back to the corresponding thrift.hive2.* arg on this HDFS connection (defaults HIVE_EMBEDDED / dwh / iext) when unset. The repo ships all three in sparksql/system.cfg only (the SparkSQL integrator is the sole consumer); change them per environment via the Integrator Config tab. See SPARKSQL-SETUP.md.

Enter the actual filesystem paths for the Hadoop XMLs (not the $MAESTRO_ENGINE_HOME variable — this field is taken literally). On the standard image that's /opt/iserver/tools/hadoop/etc/hadoop/….

General Configuration editor — HDFS connection ELTM_SPARK_DRIVER with fs.prefix, actual hdfs/core-site paths, Hive thrift db/schema/connection, and S3A keys

The Hive catalog connection. thrift.hive2.connection.name (e.g. HIVE_EMBEDDED) refers to a JDBC connection to HiveServer2. Create it like any database connection: driver class org.apache.hive.jdbc.HiveDriver, driver jar hive-jdbc-3.1.3-standalone.jar (the Apache standalone uber jar; alias MPP_Driver_Hive_JDBC.jar, shipped via the deployment's jdbc_list.json), and connection string jdbc:hive2://<host>:10000/<db> — e.g. jdbc:hive2://<server_ip>:10000/default (the same URL DBeaver uses).


How connections feed workflows

What you create here is exactly what appears in the Job editor:

Connection type Used by
Database (JDBC) The source/target of SQL and load steps.
SSH SFTP file-source / file-landing steps and remote shell.
General/Cloud (S3) File-loader staging (e.g. flat file → S3 → Redshift COPY).

Connection recipes — SparkSQL & Redshift jobs

The exact set of connections each platform needs (and what to recreate after a fresh/destructive deploy, which wipes them). Create them in the order below.

SparkSQL jobs

A SparkSQL job stages source data into HDFS as Parquet and registers it as Hive external tables, so it needs three connections (a fourth if you stage via S3). Full walk-through: SPARKSQL-SETUP.md.

# Connection Tab / Type Key fields
1 HDFS / Spark (e.g. ELTM_SPARK_DRIVER) General/Cloud Config → HDFS fs.prefix (hdfs://<host>:9000/), file.hdfs.site.xml.path + file.core.site.xml.path (actual paths on the engine host), dir.data, thrift.hive2.db=dwh, thrift.hive2.schema=iext, thrift.hive2.connection.name=HIVE_EMBEDDED, optional fs.s3a.*. Create exactly one — the engine auto-discovers it by type (first HDFS row).
2 HIVE_EMBEDDED (Hive2 JDBC) Database Connections (JDBC) Driver class org.apache.hive.jdbc.HiveDriver, jar hive-jdbc-3.1.3-standalone.jar, string jdbc:hive2://<host>:10000/dwh. Name must match thrift.hive2.connection.name / $HIVE_CONNECTION (default HIVE_EMBEDDED). Used to register external tables.
3 Source(s) Database Connections (JDBC) One per source you ingest from (SQL Server, ClickHouse, Postgres, …) — normal JDBC connection.
4 S3_CONNECTION (only if staging via S3) General/Cloud Config → Aws S3 See the S3 recipe; resolves $OBJECT_STORAGE.

$HIVE_CONNECTION / $HIVE_DATABASE / $HIVE_SCHEMA in sparksql/system.cfg override the connection's thrift.hive2.* args (details); leave them at the shipped HIVE_EMBEDDED / dwh / iext unless your environment differs.

Redshift jobs

A Redshift job loads flat file → S3 → COPY, so it needs a JDBC connection plus the S3 connection. Full walk-through: REDSHIFT-SETUP.md.

# Connection Tab / Type Key fields
1 Redshift JDBC Database Connections (JDBC) Pick the MPP_Driver_Redshift_RSJDBC.jar (not the plain jar — otherwise you can't create a job against it); driver class com.amazon.redshift.Driver; string jdbc:redshift://$HOST:$PORT/$DATABASE (TLS variant offered); user (e.g. rsadmin). Redshift Serverless uses the same driver/string — point $HOST at the workgroup endpoint (REDSHIFT-SETUP ▸ Running on Redshift Serverless).
2 S3_CONNECTION General/Cloud Config → Aws S3 Name exactly S3_CONNECTION (resolves $OBJECT_STORAGE); bucket.name / access.key / secret.key / aws.region / iam.role / dir.data. The file-loader COPY defaults to the connection's IAM role. See the S3 recipe.

Other non-connection notes for Redshift (one-time per environment, not wiped by a deploy): - AWS CLI configured on the engine host (aws configure) — the loader runs aws s3 cp/rm there. - base64 decode is native$B64EXPRESSION uses Redshift FROM_VARBYTE(TO_VARBYTE(…,'base64'),'utf-8'); no UDF to install (REDSHIFT-SETUP §3).

Users & access

Users

Tab: Users. Toolbar: Create, Edit, Delete. (Backed by t_user.)

Delete is not supported from the UI — the Delete button returns "Not Supported." Restrict an account by lowering its role or revoking component permissions instead.

Create / Edit opens the user editor:

Field Notes
User ID Username — min 4 characters; stored upper-cased.
Password Min 8 characters; stored as an MD5 hash. On Edit, leave it as-is to keep the current password, or type a new one to reset it.
Role Access level 0–3 (higher = more privileged). An admin-level role is what unlocks Administration ▸ Configure Maestro Server (this Admin Interface).

⚠️ Restart required: after you create or edit a user, the dialog warns that the ELTMaestro messaging (meta-)service must be restarted before the new/changed account takes effect — see OPERATIONS.md.

User-Permissions

Tab: User-Permissions. Fine-grained, per-component access layered on top of the role.

  1. Pick a user from the drop-down.
  2. The grid lists each tracked component — Component Type, Component Name, and Can Access.
  3. Select a row and click Toggle Access to grant or revoke that user's access to that component.

(Backed by t_component_role_permission, surfaced via v_component_role_permission.)

System, config & models

System

Tab: System. Server-level settings (key/value) from t_system_settings. It's a read-only grid — select a row and click Edit to change a value (no create/delete).

Integrator Config

Tab: Integrator Config — a live editor for the per-integrator config files on the server (system.cfg, the get_* introspection templates, static_* lists, etc.). This is how you change integrator configuration without a redeploy.

  • Reload — fetch the live config from the server into a per-file list. Use the filter box to find a file.
  • Expand a file row and edit its contents inline.
  • Show Edits — list just the files you've changed.
  • Save — write your edits back to the live server config (a minimal, surgical save of only what changed).

This is the tab where the MPP system.cfg adoption edits are applied.

Maestro Machine Learning

Tab: Maestro Machine Learning, with two sub-tabs:

  • Feature Engineering — Create / Modify / Delete feature transformations (StringIndexer, VectorIndexer, QuantileDiscretizer, Bucketizer, …).
  • Classification/Regression/Clustering Models — Create / Modify / Delete ML models.

These define the reusable feature transforms + models that ML steps in workflows consume.

Vendor tabs (Netezza, …)

Platform-specific settings tabs (e.g. Netezzat_netezza_settings). Same pattern as System: a read-only key/value grid with Edit to change a value.


That completes the Administration & Setup guide. Next: Menus in depth · Workflows.