Skip to content

Redshift integrator setup

Operator setup for an Amazon Redshift target in ELTMaestro — JDBC connection, the S3 + AWS-credentials wiring the file loader needs, native base64 decode, the per-integrator system.cfg, and a verification checklist. Companion to CLICKHOUSE-AUTH.md; same operator-doc shape.

Redshift is one of the catalog-backed cloud platforms (see FUNCTION-CATALOG-MIGRATION.md for the function-metadata side). Loads go flat file → S3 → Redshift COPY, so an S3 bucket and AWS credentials are part of the setup, not optional.

At a glance — what you need

Prerequisite Detail
Redshift cluster or Serverless workgroup Reachable host/port from the engine host; a database + schema for the integrator (defaults dev / integrator, see system.cfg). Both cluster types use the same job/config — see Running on Redshift Serverless.
Redshift JDBC driver redshift-jdbc42-2.1.0.33.jar shipped under $MAESTRO_ENGINE_HOME/library/JDBC/; driver class com.amazon.redshift.Driver.
S3 bucket A bucket the engine host can write to; referenced in the loader's upload/copy commands via $AWS_BUCKET.
S3 connection An ELTMaestro S3/cloud connection named S3_CONNECTION; system.cfg $OBJECT_STORAGE=S3_CONNECTION resolves to it.
AWS CLI Installed and configured on the engine host for the engine's OS user (aws configure → access key / secret / region). The loader's upload/copy/delete run as aws s3 cp / aws s3 rm shell commands on the server.
AWS credentials An ~/.aws/credentials file on the engine host (written by aws configure) — used by the aws s3 staging commands. Preferred for COPY is an IAM role (iam.role on the S3_CONNECTION); access-key/secret are the fallback.
IAM permissions The credentials/role must allow s3:PutObject/GetObject/DeleteObject on the bucket and Redshift COPY from S3.
base64 decode Native — no UDF to install. $B64EXPRESSION uses Redshift's FROM_VARBYTE(TO_VARBYTE(…,'base64'),'utf-8'). See base64 decode.

1. Create the Redshift JDBC connection

In the WPF client, navigate Administration ▸ Configure Maestro Server to open the Admin Interface (admin-role users only), select the Database Connections tab, then click Create (or select a row and Edit) to open the JDBC connection editor (SysJdbcSetting). Fill in the fields below — including the SSL connection string from the gotcha. The connection is persisted to the audit DB table t_jdbc (connection_name, connection_string, jdbc_driver_name, user_name, password64 (base64), jdbc_driver_jar).

Field Value
Driver class com.amazon.redshift.Driver
Driver jar redshift-jdbc42-2.1.0.33.jar (from the getJdbcJarFileNames list)
Connection string Recommended: jdbc:redshift://$HOST:$PORT/$DATABASE?ssl=true&sslfactory=com.amazon.redshift.ssl.NonValidatingFactory. The bare jdbc:redshift://$HOST:$PORT/$DATABASE only works where SSL isn't enforced.
User / Password Redshift DB user + password (password stored base64).

Gotcha — driver class: use com.amazon.redshift.Driver (the single class the modern 2.x driver registers). The legacy …jdbc41.Driver / …jdbc42.Driver class names only exist in pre-1.2 jars and throw ClassNotFoundException against the shipped redshift-jdbc42-2.1.0.33.jar.

Gotcha — use NonValidatingFactory: Redshift requires TLS, so when creating the JDBC job append ?ssl=true&sslfactory=com.amazon.redshift.ssl.NonValidatingFactory to the connection string. NonValidatingFactory skips server-certificate-chain validation, which avoids SSL handshake / "unable to find valid certification path to requested target" failures without having to import the Redshift CA into the engine host's Java truststore.

A new JDBC driver jar (if you swap versions) must be added to root-engine-install/jdbc_list.json and the image rebuilt / tarball republished — see CONVENTIONS.md.

2. Create the S3 connection + AWS credentials

The Redshift loader stages flat files to S3 before COPY, using the aws s3 CLI on the engine host. Three pieces:

  • AWS CLI on the engine host — install the AWS CLI and run aws configure for the OS user the engine runs as (access key, secret key, default region). The loader's upload/copy/delete (aws s3 cp / aws s3 rm) run as shell commands on the server, so they authenticate through this. aws configure also writes the ~/.aws/credentials file the loader parses below.

Running aws configure on the engine host — entering AWS Access Key ID and Secret Access Key

Verify the CLI can reach S3 with aws s3 ls — it should list your buckets:

aws s3 ls output listing S3 buckets, confirming the CLI is configured

  • S3 connection — create the cloud/S3 connection referenced by system.cfg $OBJECT_STORAGE (S3_CONNECTION), pointing it at your bucket. Step-by-step (the Aws S3 connection type and its bucket.name / access.key / secret.key / aws.region / iam.role / dir.data parameters) is in the user guide: Setting up an Aws S3 connection.
  • AWS credentials on the engine host — place an ~/.aws/credentials file for the OS user the engine runs as so the aws s3 staging commands authenticate. (The RedshiftStage/group-loader COPY defaults to IAM roleiam_role '$iam.role' from the S3_CONNECTION — with access-key/secret as the isIAM=false fallback.)

Linking the IAM role: cluster ↔ S3 connection ↔ JDBC user

The iam.role ARN you set on S3_CONNECTION is the role Redshift assumes to read the staged file during COPY. For COPY to accept it, that one ARN has to be wired to both the cluster and the database user on the Redshift JDBC connection:

  1. Owned by the cluster's AWS account + attached to the cluster. The role's account ID must match the cluster's account, and the role must be associated with the cluster (console Properties ▸ Associated IAM roles, or aws redshift modify-cluster-iam-roles). List the valid ARNs with:
    aws redshift describe-clusters --region <region> --query "Clusters[].IamRoles"
    
    Use one returned with ApplyStatus: in-sync. A placeholder or foreign-account ARN fails with Invalid IAM Role ARN … Role needs to be owned by the same account with cluster — and note the shipped default arn:aws:iam::123456789100:role/spectrum_role is a placeholder (account 123456789100 is a dummy), so it must be replaced.
  2. Granted for COPY to the JDBC connection's database user. The Redshift DB user that the JDBC connection logs in as must be allowed to assume that role for loads:
    GRANT ASSUMEROLE ON 'arn:aws:iam::<account>:role/<role>' TO "<db_user>" FOR COPY;
    
    Without this grant the same (correctly attached) role still fails with not authorized to assume role. Superusers can assume any attached role implicitly; regular users need the explicit ASSUMEROLE … FOR COPY grant.
  3. S3 read on the role. The role's IAM policy must allow s3:GetObject / s3:ListBucket on the staging bucket.

In short: the ARN on S3_CONNECTION must be a role the cluster owns and has attached, and one the JDBC connection's database user is granted ASSUMEROLE … FOR COPY. Miss the first → Invalid IAM Role ARN / wrong account; miss the second → not authorized to assume role.

Running on Redshift Serverless

The same jobs, system.cfg, S3 connection, COPY command, base64 decode, and JDBC driver run unchanged on Redshift Serverless — ELTMaestro makes no cluster control-plane calls, and the COPY … IAM_ROLE 'arn' syntax is identical. Only three pieces of operator wiring differ:

Aspect Provisioned cluster Redshift Serverless
JDBC endpoint ($HOST) <cluster>.<id>.<region>.redshift.amazonaws.com:5439 <workgroup>.<account-id>.<region>.redshift-serverless.amazonaws.com:5439 (port stays 5439)
IAM role discovery aws redshift describe-clusters --query "Clusters[].IamRoles" aws redshift-serverless get-namespace --namespace-name <ns> --query "namespace.iamRoles"
IAM role attachment attach to the cluster (console Properties ▸ Associated IAM roles, or aws redshift modify-cluster-iam-roles) associate to the namespace (console Namespaces ▸ [ns] ▸ Security and encryption ▸ IAM roles, or aws redshift-serverless update-namespace --namespace-name <ns> --iam-roles <arn> [<arn> …])

Everything else is identical to provisioned — no separate config or job: - JDBC driver (MPP_Driver_Redshift_RSJDBC.jar, com.amazon.redshift.Driver) and connection-string shape — just point $HOST at the workgroup endpoint. - The S3_CONNECTION + its iam.role, and the COPY … iam_role '$iam.role' command. - The GRANT ASSUMEROLE ON '<arn>' TO "<db_user>" FOR COPY; grant (Linking the IAM role) — same on Serverless. - base64 decode is native — and this is load-bearing for Serverless: Serverless has no Python (plpythonu) UDF support, so the native FROM_VARBYTE(TO_VARBYTE(…)) form (§3 below) is the only way base64 works there — the legacy UDF form would fail outright. - $QUERY_VACUUM is a no-op select (Serverless forbids manual VACUUM; auto-vacuum handles maintenance on both cluster types).

Known caveat (schema browse): get_column_names_from_table reads pg_table_def for distkey/sortkey flags. Serverless has no user-visible distribution/sort keys; if that catalog view is unavailable there, Load Known Structure may fall back to the information_schema-only column path (distkey/sortkey/identity reported as FALSE, which is correct for Serverless). Columns still resolve via information_schema regardless. Verify against your workgroup.

3. base64 decode (native, no UDF)

base64-stored columns are decoded with Redshift's native VARBYTE functions — there is nothing to install on the cluster. system.cfg sets:

$B64EXPRESSION=FROM_VARBYTE(TO_VARBYTE($COLUMN, 'base64'), 'utf-8')::$DATATYPE

TO_VARBYTE($COLUMN,'base64') decodes the base64 string to bytes, FROM_VARBYTE(…,'utf-8') turns those bytes back into UTF-8 text, and ::$DATATYPE casts to the column's type. Verify on the cluster (returns hello):

SELECT FROM_VARBYTE(TO_VARBYTE('aGVsbG8=','base64'),'utf-8');

DBeaver running SELECT FROM_VARBYTE(TO_VARBYTE('aGVsbG8=','base64'),'utf-8') against Redshift, returning a single from_varbyte column with the value "hello"

Why native: nothing to install or maintain on the cluster, and it avoids the end-of-support of Amazon Redshift Python UDFs after 2026-06-30 (new ones blocked from Patch 198). The previous integrator.b64_decode($COLUMN)::$DATATYPE form is kept commented in system.cfg for rollback only.

Note: FROM_VARBYTE returns VARCHAR (≤ 65,535 bytes) — the same practical limit as the old UDF, so no regression for decoded values within that size.

Integrator config (system.cfg)

root-engine-install/db/metadata/integrators/redshift/system.cfg is the per-platform template. Edit the Settings section for your environment; leave the "do not modify" SQL templates alone.

Variable Purpose
$SYSTEM_DEFAULT_DATABASE / $SYSTEM_DEFAULT_SCHEMA Default db (dev) / schema (integrator).
$SYSTEM_TOKEN_ENCLOSER Identifier quote char (").
$MAX_CACHE_SIZE, $SCHEMALOADERPARTSIZEBYTES, $SYSTEM_LARGE_OBJECT_ROWS, $SYSTEM_VIEW_MATERIALIZATION_THRESHOLD, $SYSTEM_STATISTICS_THRESHOLD Cache / partition / materialization / statistics tuning.
$HASH_TEMPLATE / $HASH_RETURNS / $HASH_EXPRESSION SCD-step hashing (NVL(MD5($ARG::TEXT),'')CHAR(32)).
$QUERY_* (do not modify) Operation templates: ANALYZE/VACUUM/TRUNCATE, CDC delete+insert, CTAS, view create, drop table/view, preview, cache builder.
$QUERY_ALTER_COLUMN / $QUERY_ADD_COLUMN DDL for data-type changes.
$B64EXPRESSION Native base64-decode expression FROM_VARBYTE(TO_VARBYTE($COLUMN,'base64'),'utf-8')::$DATATYPE — no UDF, see base64 decode.
$OBJECT_STORAGE The S3 connection name (S3_CONNECTION).

Metadata introspection

The WPF Schema Browser (e.g. the loader's Load Known Structure button) populates databases/schemas/tables/columns by running the get_* templates in the integrator folder against your Redshift connection (via MaestroMeta SOAP ops):

Template Query
get_database_names select current_database();
get_schema_names / get_table_names / get_view_names information_schema lookups scoped by $DATABASE / $SCHEMA.
get_column_names_from_table Rich column query — resolves type, and flags is_dist (distkey), is_sort (sortkey), is_identity, is_key (pkey) via pg_table_def + catalog joins.
get_preview_query SELECT * FROM "$SCHEMA"."$TABLE" LIMIT $LIMIT

File-loader flow (RSFileLoader)

The Redshift file loader stages a flat file to S3 and COPYs it into the target table. Source can be Local, SFTP, or Onstage (auto-selected from the upstream step). Three commands, each validated in the dialog before save:

Command What it does Must reference
Upload (uploadCommand) Local/staged file → S3. Default from $AWS_DEFAULT_UPLOAD_COMMAND. $AWS_BUCKET, $GUID, $FILENAME, $LOCALFILE
Copy (copyCommand) S3 → Redshift COPY into the target table. Default from copyCommandTemplate. $AWS_BUCKET, $GUID, $FILENAME, $TARGET
Delete (deleteCommand) Cleans the staged S3 object after load. Default from $AWS_DEFAULT_DELETE_COMMAND. $AWS_BUCKET, $GUID

$GUID scopes the staged object to the run; $FILENAME/$LOCALFILE are the target/local file names; $TARGET is the destination table. The dialog blocks save until each command references its required variables.

Verify

  1. Connection — open the Redshift connection; the editor's test should connect with com.amazon.redshift.Driver.
  2. Schema browseLoad Known Structure lists tables and resolves columns (distkey/sortkey/identity/key flags) → confirms the get_* templates run.
  3. base64 decodeSELECT FROM_VARBYTE(TO_VARBYTE('aGVsbG8=','base64'),'utf-8') returns hello.
  4. Loader job — run a file-loader job end to end; confirm the object lands in S3, COPY populates the table, and the delete command cleans up.

Common failures

Symptom Cause / fix
ClassNotFoundException on connect Legacy driver class — use com.amazon.redshift.Driver.
Invalid IAM Role ARN … Role needs to be owned by the same account with cluster The iam.role on S3_CONNECTION is a placeholder or from another AWS account (the shipped default …123456789100…/spectrum_role is a placeholder). Set it to an in-sync ARN from aws redshift describe-clusters … IamRoles, owned by the cluster's account — see Linking the IAM role.
COPY "not authorized to assume role" (ARN is correct + attached) The Redshift DB user on the JDBC connection lacks the ASSUMEROLE grant: GRANT ASSUMEROLE ON '<arn>' TO "<db_user>" FOR COPY; — see Linking the IAM role.
COPY "access denied" / "not authorized to assume role" The iam.role on the S3_CONNECTION isn't attached to the cluster or lacks S3 read (IAM mode); or the access-key/secret are wrong (keys mode).
Dialog won't save the loader A command is missing a required $-var ($AWS_BUCKET/$GUID/$FILENAME/$TARGET) — see the table.
COPY access denied IAM/credentials lack S3 read or Redshift COPY-from-S3 rights.
base64 column not decoding / FROM_VARBYTE/TO_VARBYTE error $B64EXPRESSION should be the native form FROM_VARBYTE(TO_VARBYTE($COLUMN,'base64'),'utf-8')::$DATATYPE. The legacy integrator.b64_decode Python UDF is no longer required — see base64 decode.