storage/engine: improve cluster version handling #42653

petermattis · 2019-11-21T16:36:26Z

The cluster version is currently stored inside of RocksDB/Pebble by Store.WriteClusterVersion. At startup, a node will refuse to start if the cluster version is newer than what the node supports. This prevents accidental downgrades. Unfortunately, there is a hole in this strategy. Because the cluster version is stored in RocksDB, we have to open the RocksDB instance in order to read it. Opening the RocksDB instance causes the RocksDB WAL to be replayed and replaying the WAL itself can callback into CRDB code which is cluster version specific. It looks like this exact scenario caused problems for a user who upgraded to 19.1 and then restarted a node with 2.0.

There are two possibilities for improvement here. We could open the RocksDB instance read-only in order to check the cluster version, then open it again as writable. This should work, but is mildly unfortunate as we'd end up replaying the WAL twice.

Another option is to move the cluster version storage out of RocksDB. We already have a COCKROACHDB_VERSION. We could extend this file, or add another file, that stores the cluster version.

Jira issue: CRDB-5336

The text was updated successfully, but these errors were encountered:

github-actions · 2021-06-04T19:48:57Z

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it in
5 days to keep the issue queue tidy. Thank you for your contribution
to CockroachDB!

jbowens · 2021-12-29T20:43:35Z

We ran into this issue when making backwards incompatible encryption-at-rest changes and began moving the cluster version out of Pebble. A new file STORAGE_MIN_VERSION was introduced in #68619. The STORAGE_MIN_VERSION file contains a roachpb.Version protobuf. In #69203, kvserver.WriteClusterVersion was updated to write to both the key within Pebble and this new file STORAGE_MIN_VERSION. This was included in the 21.2 release. In 22.1, we can switch remaining keys.StoreClusterVersionKey readers over to reading the cluster version through this file, and remove the keys.StoreClusterVersionKey altogether.

nicktrav · 2022-07-11T17:52:45Z

Doing some backlog triage.

My understanding of this issue is that we have implemented the "out of DB" version handling, and all that remains is some cleanup:

In 22.1, we can switch remaining keys.StoreClusterVersionKey readers over to reading the cluster version through this file, and remove the keys.StoreClusterVersionKey altogether.

nicktrav · 2023-02-10T00:17:09Z

@RaduBerinde - does any of your recent work in #96394 help here?

RaduBerinde · 2023-02-10T00:45:04Z

Yes, I am working on the cleanup mentioned above.

github-actions bot added the no-issue-activity label Jun 4, 2021

petermattis added A-storage Relating to our storage engine (Pebble) on-disk storage. and removed no-issue-activity labels Jun 4, 2021

jlinder added the T-storage Storage Team label Jun 16, 2021

nicktrav added the C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. label Jul 11, 2022

jbowens mentioned this issue Oct 12, 2022

storage: prevent opening unsupported database engine #89836

Closed

nicktrav added the E-quick-win Likely to be a quick win for someone experienced. label Feb 10, 2023

RaduBerinde self-assigned this Feb 10, 2023

RaduBerinde mentioned this issue Feb 14, 2023

storage: require min version file and deprecate cluster version key #97054

Merged

craig bot closed this as completed in cc58f15 Feb 18, 2023

RaduBerinde mentioned this issue Feb 21, 2023

roachtest: inconsistency failed #97337

Closed

jbowens added this to [Deprecated] Storage Jun 4, 2024

jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage/engine: improve cluster version handling #42653

storage/engine: improve cluster version handling #42653

petermattis commented Nov 21, 2019 •

edited by cockroach-jira-scripts

Loading

github-actions bot commented Jun 4, 2021

jbowens commented Dec 29, 2021

nicktrav commented Jul 11, 2022

nicktrav commented Feb 10, 2023

RaduBerinde commented Feb 10, 2023

storage/engine: improve cluster version handling #42653

storage/engine: improve cluster version handling #42653

Comments

petermattis commented Nov 21, 2019 • edited by cockroach-jira-scripts Loading

github-actions bot commented Jun 4, 2021

jbowens commented Dec 29, 2021

nicktrav commented Jul 11, 2022

nicktrav commented Feb 10, 2023

RaduBerinde commented Feb 10, 2023

petermattis commented Nov 21, 2019 •

edited by cockroach-jira-scripts

Loading