upgrade: don't blindly run permanent migrations #93071

This patch makes the "primordial" permanent upgrades not run in clusters being upgraded from 22.2 or lower. Startup migrations were recently replaced with "permanent upgrades", in cockroachdb#91627. The permanent new upgrades were running on node startup regardless of whether it was a "new cluster" or an existing one - i.e. regardless of whether the previous startupmigrations had also run, on the argument that startupmigrations and upgrades are supposed to be idempotent anyway. But, alas, there's idempotency and then there's idempotency: startupmigrations/upgrades are supposed to be OK with running multiple times at the same BinaryVersion in a "new cluster", but not necessarily across BinaryVersions and in a cluster that has been in use. There are different things that can go wrong when one of these migrations runs multiple times in these general conditions, because the migration might makes implicit assumptions about the schema and data of the cluster it's running in: 1. A migration might assume that all the data in a system table was created before a CRDB version changed some semantics. For example, the migration deleted in cockroachdb#92597 was assuming that `CREATEROLE` role option had a certain meaning for all the users who have it and was doing a backfill across all these users. Running this migration again at an arbitrary later point would affect unintended new users. 2. A migration assumes the schema of system tables it operates on. As this schema (which is part of the bootstrap image for new cluster) changes with new versions, the code of the upgrade changes also. This means that the upgrade is no longer suitable for running against clusters that don't have the upgraded schema - i.e. clusters that have not yet been upgraded to the latest schema. This is a real problem because we're adding a column to system.users, and the addRootUser upgrade can no longer run against 22.2 clusters. This patch guards all upgrades on checking that the old corresponding startupmigration has not run, thus enabling the mentioned system schema change in 23.1. Release note: None Epic: None

Before this patch, if a node encountered a job corresponding to an upgrade for a version that does not have a registered upgrade, it would declare the job to be successful. This is bad; we can't just declare that an upgrade has run when, in fact, it hasn't. This patch turns this condition into a job error. This situation should not arise; it indicates an incompatibility between the binary versions: one node has an upgrade, another one doesn't, and yet both binaries are running at a cluster version above the respective upgrade. Release note: None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

upgrade: don't blindly run permanent migrations #93071

upgrade: don't blindly run permanent migrations #93071

Commits on Dec 8, 2022