Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

upgrade: don't blindly run permanent migrations #93071

Merged

Commits on Dec 8, 2022

  1. upgrade: don't blindly run permanent migrations

    This patch makes the "primordial" permanent upgrades not run in clusters
    being upgraded from 22.2 or lower.
    
    Startup migrations were recently replaced with "permanent upgrades", in cockroachdb#91627.
    The permanent new upgrades were running on node startup regardless of
    whether it was a "new cluster" or an existing one - i.e. regardless of
    whether the previous startupmigrations had also run, on the argument
    that startupmigrations and upgrades are supposed to be idempotent
    anyway. But, alas, there's idempotency and then there's idempotency:
    startupmigrations/upgrades are supposed to be OK with running multiple
    times at the same BinaryVersion in a "new cluster", but not necessarily
    across BinaryVersions and in a cluster that has been in use. There are
    different things that can go wrong when one of these migrations runs
    multiple times in these general conditions, because the migration
    might makes implicit assumptions about the schema and data of the
    cluster it's running in:
    1. A migration might assume that all the data in a system table was
       created before a CRDB version changed some semantics. For example,
       the migration deleted in cockroachdb#92597 was assuming that `CREATEROLE` role
       option had a certain meaning for all the users who have it and was doing
       a backfill across all these users. Running this migration again at an
       arbitrary later point would affect unintended new users.
    2. A migration assumes the schema of system tables it operates on. As
       this schema (which is part of the bootstrap image for new cluster)
       changes with new versions, the code of the upgrade changes also. This
       means that the upgrade is no longer suitable for running against
       clusters that don't have the upgraded schema - i.e. clusters that have
       not yet been upgraded to the latest schema. This is a real problem
       because we're adding a column to system.users, and the addRootUser
       upgrade can no longer run against 22.2 clusters.
    
    This patch guards all upgrades on checking that the old corresponding
    startupmigration has not run, thus enabling the mentioned system schema
    change in 23.1.
    
    Release note: None
    Epic: None
    andreimatei committed Dec 8, 2022
    Configuration menu
    Copy the full SHA
    3343113 View commit details
    Browse the repository at this point in the history
  2. upgrade: don't pretend to run unrecognized upgrades

    Before this patch, if a node encountered a job corresponding to an
    upgrade for a version that does not have a registered upgrade, it would
    declare the job to be successful. This is bad; we can't just declare
    that an upgrade has run when, in fact, it hasn't. This patch turns this
    condition into a job error. This situation should not arise; it
    indicates an incompatibility between the binary versions: one node has
    an upgrade, another one doesn't, and yet both binaries are running at
    a cluster version above the respective upgrade.
    
    Release note: None
    andreimatei committed Dec 8, 2022
    Configuration menu
    Copy the full SHA
    90dcb5c View commit details
    Browse the repository at this point in the history