Skip to content

In-place upgrade

Upgrade an existing /data/iserver/ install with a newer eltm-portable-<VERSION>.tar.gz while keeping the audit DB, deployed workflows, keystore, and operator tuning intact.

Run as the same unprivileged user that owns the install (maestro in the INSTALL.md convention). Do not run as root.

What is preserved

update-server.sh keeps these untouched across the upgrade:

Path What's in it
server/pgsql/data/ The audit DB (sqlmaestro). Migrations bundled in the new tarball are applied on top of it.
data/workflow/ Deployed jobs.
data/log/ Historical logs.
transient/ PID files, staging area.
server/keystore.jks TLS certificate (generated with this host's SAN at first install).
tools/{spark,hadoop}/ Apache binaries downloaded at first install. Rarely change between releases.
~/.env_integrator Operator-tuned environment file.
~/.env_integrator.paths Distro-specific JAVA_HOME / PGBIN set at first install.

What is replaced

Overwritten by the new tarball:

Path What's in it
server/engine/{*.jar, lib/} Spring Boot meta-service jar + engine CLI jar + their lib dependencies.
server/webserver/admin/ Contents of the source-tree admin.commands/ dir (mstart-*, mstop-*, ...) — publish.sh copies admin.commands/* into this deployed location.
server/utils/ SOAP ADMIN_* utilities invoked by MaestroMeta.
server/pgsql.factory/ initdb.sql, metadata, migrations, hive-metastore schema.
bin/ install.sh, update-server.sh, lib modules.
library/JDBC/ All bundled JDBC drivers.

After the swap, any newly-bundled migrations/*.sql is applied against the live audit DB, idempotently via the schema_version tracking table (same pattern as the Docker entrypoint).

Before you upgrade

Take a one-off backup of the host so you have a clean rollback target. See BACKUP.md § One-off manual backup — vzdump --mode stop is the canonical recipe for the host VM. If you only care about the audit DB + workflows + keystore, the in-place update-server.sh already preserves them (see "What is preserved" above), so a full VM snapshot is the belt-and-braces option.

Procedure

su - maestro
. ~/.env_integrator
/data/iserver/bin/update-server.sh /path/to/eltm-portable-<NEW-VERSION>.tar.gz

The script will:

  1. Source ~/.env_integrator and refuse to run if it's missing (you must have an existing install)
  2. Stop all running services via mstop-all (Postgres, meta-service, engine, ...)
  3. Wait briefly (3 s) for processes to settle
  4. Extract the new tarball into a mktemp -d staging area
  5. rsync-style swap of the "replaced" paths from the staging tree into /data/iserver/
  6. Restart Postgres alone
  7. Apply any newly-bundled migrations from server/pgsql.factory/migrations/
  8. Start the rest of the stack via mstart-all
  9. Clean up the staging dir

Approximate downtime: 30 s - 2 min, depending on how many migrations are pending and how fast the host's disk is.

Rollback

There is no automatic rollback. If a release is bad:

  1. mstop-all
  2. Re-run update-server.sh with the prior tarball — the audit DB is preserved across runs, so any data created since the bad upgrade stays. If the bad release added migrations that the old jars don't understand, you may need to revert those rows in schema_version first.

Recommended practice: keep the last 2-3 tarballs in a shared location (e.g. /data/releases/) so you can downgrade quickly without re-downloading from Jenkins.

Verifying the upgrade succeeded

. ~/.env_integrator
show-mprocess
cat /data/iserver/VERSION                     # tarball metadata: VERSION, BUILT_AT, GIT_SHA
curl -k https://localhost:8181/MaestroMetaDataProviderService/MaestroMetaService?wsdl

The VERSION file is replaced on every upgrade; the GIT_SHA line is the canonical "what's running here" answer.

Major upgrade from a legacy install

update-server.sh is the right tool when the legacy and modern installs share the same host and the same /data/iserver/ tree. It is not the right tool when you're cutting over to a fresh host — typical major-version scenarios:

  • legacy install on a GlassFish-era VM (Java 8, asadmin / domain1) → new host on the Spring Boot stack (Java 21 + embedded Jetty 12)
  • legacy install on an older OS (RHEL 7, Ubuntu 18.04) past distro EOL → new host on a supported OS
  • legacy install in a customer datacenter → new install in cloud (or vice versa)

In those cases the audit DB on the new host is freshly seeded by initdb.sql — it has the modern schema but no operator content. To carry forward the operator's investment (jobs, schedules, connections, watermarks) you copy the job-meta rows by SQL from old to new.

What to copy

The t_job_meta schema is stable across versions (see SHARED.md for the migration-stability policy), so this is a data-only copy. If a legacy column was renamed/dropped between versions, transform during INSERT.

Tables to copy, in dependency order (referenced data first, referring data last):

# Table What's in it Notes
1 t_jdbc JDBC connection definitions (URL, driver class, user, base64 password) Passwords are base64-encoded in password64; carry as-is. Jobs reference these by connection_name.
2 t_connection_general Cloud / SSH / SMTP connection params Same handling as t_jdbc (compound base64 in connection_parameters).
3 t_job_meta Live job definitions The core artifact.
4 t_job_meta_history Version history per job Optional but recommended for audit continuity. version_key is from seq_job_version_id — preserve the sequence high-water mark (see below).
5 t_job_alert_setting Per-job alerting Optional; copy only if alerts are configured.
6 t_job_permissions Per-job RBAC Required only if t_user rows are also migrated.
7 t_workflow_item Workflow assembly (jobs grouped into batches) Copy if workflows are used.
8 t_step_watermark Incremental-load high-water marks Critical for upsert / SCD2 jobs — losing watermarks restarts every incremental load from zero.

Do not copy:

  • t_workflow_state_history, t_step_run_detail, t_step_status, t_service_audit — runtime / audit rows. The new install starts clean. If you need run history for compliance, snapshot them as CSV from the legacy host before retiring it.
  • The discovered-schema cache (t_jdbc_database, t_jdbc_schema, t_jdbc_table, t_jdbc_column) — repopulated by the first CONTROL_SYNC_CONNECTION against each t_jdbc row after cutover.
  • t_user.password_md5 — these are MD5 hashes (see mset-password-md5). Re-issue passwords on the new host rather than carrying hashes across a major upgrade.

Procedure

On the legacy host, dump the listed tables (one pg_dump --data-only --table=... per table, in dependency order):

# legacy host, as the postgres owner of the sqlmaestro DB
. ~/.env_integrator

OUT=/tmp/job-meta-export-$(date +%Y%m%d).sql
> "$OUT"
for T in t_jdbc t_connection_general t_job_meta t_job_meta_history \
         t_job_alert_setting t_job_permissions t_workflow_item t_step_watermark; do
    pg_dump --data-only --column-inserts --table=public.$T sqlmaestro >> "$OUT"
done

# also capture the sequence so version_key doesn't collide on the new host
psql sqlmaestro -tAc "SELECT 'SELECT setval(''public.seq_job_version_id'', '
                          || last_value || ', '
                          || is_called || ');'
                       FROM public.seq_job_version_id" >> "$OUT"

--column-inserts (rather than the default COPY) is verbose but tolerates minor column-order drift between legacy and modern schemas — INSERT names each column explicitly. For very large t_job_meta_history tables, fall back to COPY and accept the constraint that the schemas must match column-for-column.

Transfer $OUT to the new host (scp, S3, USB, whatever fits the threat model — the file contains base64 passwords).

On the modern host, with the new install already running on a fresh DB:

# new host, as the same unprivileged user that owns the install
. ~/.env_integrator
mstop-all                                # quiesce engine + meta-service; Postgres stays up
psql sqlmaestro -f /path/to/job-meta-export-YYYYMMDD.sql
mstart-all

The meta-service rebuilds its in-memory caches on startup, so the freshly inserted jobs appear in the WPF after the next login. Trigger CONTROL_SYNC_CONNECTION once per JDBC connection (from the WPF Admin tab, or mpsql calling the SOAP op) to repopulate the discovered-schema cache.

Verifying the copy

. ~/.env_integrator

# row count parity
psql sqlmaestro -tAc "SELECT 't_job_meta', count(*) FROM public.t_job_meta
                      UNION ALL SELECT 't_jdbc', count(*) FROM public.t_jdbc
                      UNION ALL SELECT 't_step_watermark', count(*) FROM public.t_step_watermark"

# spot-check that watermarks weren't reset
psql sqlmaestro -tAc "SELECT job_name, step_name, watermark_value
                      FROM public.t_step_watermark
                      WHERE watermark_value <> ''
                      ORDER BY job_name LIMIT 20"

# version_key sequence is past the highest historical row
psql sqlmaestro -tAc "SELECT last_value FROM public.seq_job_version_id;
                      SELECT max(version_key) FROM public.t_job_meta_history"

Counts on both hosts should match for every table you copied. If seq_job_version_id.last_value is lower than max(version_key) in t_job_meta_history, the next save from the WPF will fail on duplicate-PK; re-run the setval line from $OUT.

Gotchas

  • Connection-test ordering. Test one JDBC connection from the WPF before running any job — the password round-trip exercises Crypt.decodeStr against the carried-over base64 (see SHARED.md on Crypt). A silent decode failure on the new host means the JVM is missing the unlimited-strength JCE jars or a different Java vendor.
  • Watermarks must be present before the first run. If you start jobs with empty watermarks on the new host, incremental loads degrade to full reloads — and on large fact tables that's a costly mistake to discover after the fact.
  • Legacy schema drift. If the legacy t_job_meta has columns the modern schema doesn't, pg_dump --column-inserts will produce INSERTs that reference missing columns. Edit $OUT to drop those columns before loading; or run the dump through sed -E 's/, *legacy_column//; s/, *NULL//' patterns targeted at the offending columns. There is no general-purpose translator — this is one-off, per-legacy-version work.
  • Custom Postgres functions (lock_job, clear_lock, set_job_run_status, etc., all in initdb.sql) are re-seeded by the new install's initdb.sql — do not dump them from legacy. Dumping legacy function bodies risks pinning the new install to an old definition.