In-place upgrade¶
Upgrade an existing /data/iserver/ install with a newer eltm-portable-<VERSION>.tar.gz while keeping the audit DB, deployed workflows, keystore, and operator tuning intact.
Run as the same unprivileged user that owns the install (maestro in the INSTALL.md convention). Do not run as root.
What is preserved¶
update-server.sh keeps these untouched across the upgrade:
| Path | What's in it |
|---|---|
server/pgsql/data/ |
The audit DB (sqlmaestro). Migrations bundled in the new tarball are applied on top of it. |
data/workflow/ |
Deployed jobs. |
data/log/ |
Historical logs. |
transient/ |
PID files, staging area. |
server/keystore.jks |
TLS certificate (generated with this host's SAN at first install). |
tools/{spark,hadoop}/ |
Apache binaries downloaded at first install. Rarely change between releases. |
~/.env_integrator |
Operator-tuned environment file. |
~/.env_integrator.paths |
Distro-specific JAVA_HOME / PGBIN set at first install. |
What is replaced¶
Overwritten by the new tarball:
| Path | What's in it |
|---|---|
server/engine/{*.jar, lib/} |
Spring Boot meta-service jar + engine CLI jar + their lib dependencies. |
server/webserver/admin/ |
Contents of the source-tree admin.commands/ dir (mstart-*, mstop-*, ...) — publish.sh copies admin.commands/* into this deployed location. |
server/utils/ |
SOAP ADMIN_* utilities invoked by MaestroMeta. |
server/pgsql.factory/ |
initdb.sql, metadata, migrations, hive-metastore schema. |
bin/ |
install.sh, update-server.sh, lib modules. |
library/JDBC/ |
All bundled JDBC drivers. |
After the swap, any newly-bundled migrations/*.sql is applied against the live audit DB, idempotently via the schema_version tracking table (same pattern as the Docker entrypoint).
Before you upgrade¶
Take a one-off backup of the host so you have a clean rollback target. See BACKUP.md § One-off manual backup — vzdump --mode stop is the canonical recipe for the host VM. If you only care about the audit DB + workflows + keystore, the in-place update-server.sh already preserves them (see "What is preserved" above), so a full VM snapshot is the belt-and-braces option.
Procedure¶
su - maestro
. ~/.env_integrator
/data/iserver/bin/update-server.sh /path/to/eltm-portable-<NEW-VERSION>.tar.gz
The script will:
- Source
~/.env_integratorand refuse to run if it's missing (you must have an existing install) - Stop all running services via
mstop-all(Postgres, meta-service, engine, ...) - Wait briefly (3 s) for processes to settle
- Extract the new tarball into a
mktemp -dstaging area rsync-style swap of the "replaced" paths from the staging tree into/data/iserver/- Restart Postgres alone
- Apply any newly-bundled migrations from
server/pgsql.factory/migrations/ - Start the rest of the stack via
mstart-all - Clean up the staging dir
Approximate downtime: 30 s - 2 min, depending on how many migrations are pending and how fast the host's disk is.
Rollback¶
There is no automatic rollback. If a release is bad:
mstop-all- Re-run
update-server.shwith the prior tarball — the audit DB is preserved across runs, so any data created since the bad upgrade stays. If the bad release added migrations that the old jars don't understand, you may need to revert those rows inschema_versionfirst.
Recommended practice: keep the last 2-3 tarballs in a shared location (e.g. /data/releases/) so you can downgrade quickly without re-downloading from Jenkins.
Verifying the upgrade succeeded¶
. ~/.env_integrator
show-mprocess
cat /data/iserver/VERSION # tarball metadata: VERSION, BUILT_AT, GIT_SHA
curl -k https://localhost:8181/MaestroMetaDataProviderService/MaestroMetaService?wsdl
The VERSION file is replaced on every upgrade; the GIT_SHA line is the canonical "what's running here" answer.
Major upgrade from a legacy install¶
update-server.sh is the right tool when the legacy and modern installs share the same host and the same /data/iserver/ tree. It is not the right tool when you're cutting over to a fresh host — typical major-version scenarios:
- legacy install on a GlassFish-era VM (Java 8, asadmin /
domain1) → new host on the Spring Boot stack (Java 21 + embedded Jetty 12) - legacy install on an older OS (RHEL 7, Ubuntu 18.04) past distro EOL → new host on a supported OS
- legacy install in a customer datacenter → new install in cloud (or vice versa)
In those cases the audit DB on the new host is freshly seeded by initdb.sql — it has the modern schema but no operator content. To carry forward the operator's investment (jobs, schedules, connections, watermarks) you copy the job-meta rows by SQL from old to new.
What to copy¶
The t_job_meta schema is stable across versions (see SHARED.md for the migration-stability policy), so this is a data-only copy. If a legacy column was renamed/dropped between versions, transform during INSERT.
Tables to copy, in dependency order (referenced data first, referring data last):
| # | Table | What's in it | Notes |
|---|---|---|---|
| 1 | t_jdbc |
JDBC connection definitions (URL, driver class, user, base64 password) | Passwords are base64-encoded in password64; carry as-is. Jobs reference these by connection_name. |
| 2 | t_connection_general |
Cloud / SSH / SMTP connection params | Same handling as t_jdbc (compound base64 in connection_parameters). |
| 3 | t_job_meta |
Live job definitions | The core artifact. |
| 4 | t_job_meta_history |
Version history per job | Optional but recommended for audit continuity. version_key is from seq_job_version_id — preserve the sequence high-water mark (see below). |
| 5 | t_job_alert_setting |
Per-job alerting | Optional; copy only if alerts are configured. |
| 6 | t_job_permissions |
Per-job RBAC | Required only if t_user rows are also migrated. |
| 7 | t_workflow_item |
Workflow assembly (jobs grouped into batches) | Copy if workflows are used. |
| 8 | t_step_watermark |
Incremental-load high-water marks | Critical for upsert / SCD2 jobs — losing watermarks restarts every incremental load from zero. |
Do not copy:
t_workflow_state_history,t_step_run_detail,t_step_status,t_service_audit— runtime / audit rows. The new install starts clean. If you need run history for compliance, snapshot them as CSV from the legacy host before retiring it.- The discovered-schema cache (
t_jdbc_database,t_jdbc_schema,t_jdbc_table,t_jdbc_column) — repopulated by the firstCONTROL_SYNC_CONNECTIONagainst eacht_jdbcrow after cutover. t_user.password_md5— these are MD5 hashes (seemset-password-md5). Re-issue passwords on the new host rather than carrying hashes across a major upgrade.
Procedure¶
On the legacy host, dump the listed tables (one pg_dump --data-only --table=... per table, in dependency order):
# legacy host, as the postgres owner of the sqlmaestro DB
. ~/.env_integrator
OUT=/tmp/job-meta-export-$(date +%Y%m%d).sql
> "$OUT"
for T in t_jdbc t_connection_general t_job_meta t_job_meta_history \
t_job_alert_setting t_job_permissions t_workflow_item t_step_watermark; do
pg_dump --data-only --column-inserts --table=public.$T sqlmaestro >> "$OUT"
done
# also capture the sequence so version_key doesn't collide on the new host
psql sqlmaestro -tAc "SELECT 'SELECT setval(''public.seq_job_version_id'', '
|| last_value || ', '
|| is_called || ');'
FROM public.seq_job_version_id" >> "$OUT"
--column-inserts (rather than the default COPY) is verbose but tolerates minor column-order drift between legacy and modern schemas — INSERT names each column explicitly. For very large t_job_meta_history tables, fall back to COPY and accept the constraint that the schemas must match column-for-column.
Transfer $OUT to the new host (scp, S3, USB, whatever fits the threat model — the file contains base64 passwords).
On the modern host, with the new install already running on a fresh DB:
# new host, as the same unprivileged user that owns the install
. ~/.env_integrator
mstop-all # quiesce engine + meta-service; Postgres stays up
psql sqlmaestro -f /path/to/job-meta-export-YYYYMMDD.sql
mstart-all
The meta-service rebuilds its in-memory caches on startup, so the freshly inserted jobs appear in the WPF after the next login. Trigger CONTROL_SYNC_CONNECTION once per JDBC connection (from the WPF Admin tab, or mpsql calling the SOAP op) to repopulate the discovered-schema cache.
Verifying the copy¶
. ~/.env_integrator
# row count parity
psql sqlmaestro -tAc "SELECT 't_job_meta', count(*) FROM public.t_job_meta
UNION ALL SELECT 't_jdbc', count(*) FROM public.t_jdbc
UNION ALL SELECT 't_step_watermark', count(*) FROM public.t_step_watermark"
# spot-check that watermarks weren't reset
psql sqlmaestro -tAc "SELECT job_name, step_name, watermark_value
FROM public.t_step_watermark
WHERE watermark_value <> ''
ORDER BY job_name LIMIT 20"
# version_key sequence is past the highest historical row
psql sqlmaestro -tAc "SELECT last_value FROM public.seq_job_version_id;
SELECT max(version_key) FROM public.t_job_meta_history"
Counts on both hosts should match for every table you copied. If seq_job_version_id.last_value is lower than max(version_key) in t_job_meta_history, the next save from the WPF will fail on duplicate-PK; re-run the setval line from $OUT.
Gotchas¶
- Connection-test ordering. Test one JDBC connection from the WPF before running any job — the password round-trip exercises
Crypt.decodeStragainst the carried-over base64 (see SHARED.md on Crypt). A silent decode failure on the new host means the JVM is missing the unlimited-strength JCE jars or a different Java vendor. - Watermarks must be present before the first run. If you start jobs with empty watermarks on the new host, incremental loads degrade to full reloads — and on large fact tables that's a costly mistake to discover after the fact.
- Legacy schema drift. If the legacy
t_job_metahas columns the modern schema doesn't,pg_dump --column-insertswill produce INSERTs that reference missing columns. Edit$OUTto drop those columns before loading; or run the dump throughsed -E 's/, *legacy_column//; s/, *NULL//'patterns targeted at the offending columns. There is no general-purpose translator — this is one-off, per-legacy-version work. - Custom Postgres functions (
lock_job,clear_lock,set_job_run_status, etc., all ininitdb.sql) are re-seeded by the new install'sinitdb.sql— do not dump them from legacy. Dumping legacy function bodies risks pinning the new install to an old definition.