-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent in-place downgrades and invalid upgrades #41731
Prevent in-place downgrades and invalid upgrades #41731
Conversation
Downgrading an Elasticsearch node to an earlier version is unsupported, because we do not make any attempt to guarantee that a node can read any of the on-disk data written by a future version. Yet today we do not actively prevent downgrades, and sometimes users will attempt to roll back a failed upgrade with an in-place downgrade and get into an unrecoverable state. This change adds the current version of the node to the node metadata file, and checks the version found in this file against the current version at startup. If the node cannot be sure of its ability to read the on-disk data then it refuses to start, preserving any on-disk data in its upgraded state. This change also adds a command-line tool to overwrite the node metadata file without performing any version checks, to unsafely bypass these checks and recover the historical and lenient behaviour.
Pinging @elastic/es-distributed |
Pinging @elastic/es-core-infra |
@elasticmachine please run elasticsearch-ci/1. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, @DaveCTurner!
In general looks good, I've left some comments.
return new NodeMetaData(nodeId); | ||
final Version nodeVersion; | ||
if (this.nodeVersion == null) { | ||
assert Version.CURRENT.major <= Version.V_7_0_0.major + 1 : "version is required in the node metadata from v9 onwards"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to use V_8_0_0 instead of V_7_0_0+1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we upgrade master
to version 9 the constant Version.V_8_0_0
will remain in existence, but Version.V_7_0_0
should be removed, giving a compile-time failure of this assertion.
|
||
public NodeMetaData upgradeToCurrentVersion() { | ||
if (nodeVersion.equals(Version.V_EMPTY)) { | ||
assert Version.CURRENT.major <= Version.V_7_0_0.major + 1 : "version is required in the node metadata from v9 onwards"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not to use V_8_0_0 instead of V_7_0_0+1 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we upgrade master
to version 9 the constant Version.V_8_0_0
will remain in existence, but Version.V_7_0_0
should be removed, giving a compile-time failure of this assertion.
@Override | ||
public Settings onNodeStopped(String nodeName) { | ||
try { | ||
for (Path dataPath : dataPaths) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop seems to be removing node metadata files. Why is it needed? You anyway re-write the files in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is a throwback to a version of this PR that failed the node if the node metadata file was removed but the data directory was nonempty. I think we should be stricter about starting a node on such a data directory, but that change has wider impact on the tests and is worthy of a separate discussion. Removed in 3320d5a.
|
||
public class NodeMetaDataTests extends ESTestCase { | ||
private Version randomVersion() { | ||
return rarely() ? Version.fromId(randomInt()) : VersionUtils.randomVersion(random()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we really want to use rarely part here, can we add a comment why Version from any int exists?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, comment added in d4888db.
|
||
final Path tempDir = createTempDir(); | ||
final Path stateDir = Files.createDirectory(tempDir.resolve(MetaDataStateFormat.STATE_DIR_NAME)); | ||
final InputStream resource = this.getClass().getResourceAsStream("testReadsFormatWithoutVersion.binary"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather avoid adding a binary file a resource. Can we have instead a method for creating NodeMetadData without version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather do it this way, because the important behaviour is that we can correctly read versions of this file written by arbitrarily ancient versions of Elasticsearch of which the given resource is an example. If we create the metadata file in this test then we might change the format in a way that round-trips correctly but which doesn't reflect the behaviour of ancient versions of Elasticsearch any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this tested by our full / rolling cluster upgrade tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Today it is, but in future we won't be able to produce version-free metadata files any more and yet we'll want to have a test saying that we refuse to load them in place of the one here. However the comment I left is misleading and I adjusted it in 9cb083f.
final Version nodeVersion = NodeMetaDataTests.tooNewVersion(); | ||
NodeMetaData.FORMAT.writeAndCleanup(new NodeMetaData(nodeId, nodeVersion), nodePaths); | ||
final MockTerminal mockTerminal = new MockTerminal(); | ||
mockTerminal.addTextInput("n\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can do some randomization here as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, done in e805ec3.
} | ||
|
||
//package-private for testing | ||
void testExecute(Terminal terminal, OptionSet options, Environment env) throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My IDE claims this method is not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So does mine. Removed in 4961fd1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @andrershov, I've addressed or answered your comments.
|
||
public NodeMetaData upgradeToCurrentVersion() { | ||
if (nodeVersion.equals(Version.V_EMPTY)) { | ||
assert Version.CURRENT.major <= Version.V_7_0_0.major + 1 : "version is required in the node metadata from v9 onwards"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we upgrade master
to version 9 the constant Version.V_8_0_0
will remain in existence, but Version.V_7_0_0
should be removed, giving a compile-time failure of this assertion.
return new NodeMetaData(nodeId); | ||
final Version nodeVersion; | ||
if (this.nodeVersion == null) { | ||
assert Version.CURRENT.major <= Version.V_7_0_0.major + 1 : "version is required in the node metadata from v9 onwards"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When we upgrade master
to version 9 the constant Version.V_8_0_0
will remain in existence, but Version.V_7_0_0
should be removed, giving a compile-time failure of this assertion.
|
||
public class NodeMetaDataTests extends ESTestCase { | ||
private Version randomVersion() { | ||
return rarely() ? Version.fromId(randomInt()) : VersionUtils.randomVersion(random()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, comment added in d4888db.
|
||
final Path tempDir = createTempDir(); | ||
final Path stateDir = Files.createDirectory(tempDir.resolve(MetaDataStateFormat.STATE_DIR_NAME)); | ||
final InputStream resource = this.getClass().getResourceAsStream("testReadsFormatWithoutVersion.binary"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would rather do it this way, because the important behaviour is that we can correctly read versions of this file written by arbitrarily ancient versions of Elasticsearch of which the given resource is an example. If we create the metadata file in this test then we might change the format in a way that round-trips correctly but which doesn't reflect the behaviour of ancient versions of Elasticsearch any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few smaller comments, looking good o.w.
|
||
[float] | ||
=== Synopsis | ||
|
||
[source,shell] | ||
-------------------------------------------------- | ||
bin/elasticsearch-node repurpose|unsafe-bootstrap|detach-cluster | ||
bin/elasticsearch-node repurpose|unsafe-bootstrap|detach-cluster|overwrite-version |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about override-minimum-version
instead of overwrite-version
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The version checks are not simply a minimum. Today they also include a maximum, and in future we may add more complex constraints. But I do prefer override
, thanks, I adjusted things in ba6ae23.
try { | ||
return new NodeEnvironment.NodePath(path); | ||
} catch (IOException e) { | ||
throw new ElasticsearchException("Unable to investigate path: " + path + ": " + e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not pass e
as cause to the ElasticsearchException
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I blindly moved this method here from NodeRepurposeCommand
, but yes an inner exception is preferable, see e12ca62.
|
||
final Path tempDir = createTempDir(); | ||
final Path stateDir = Files.createDirectory(tempDir.resolve(MetaDataStateFormat.STATE_DIR_NAME)); | ||
final InputStream resource = this.getClass().getResourceAsStream("testReadsFormatWithoutVersion.binary"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this tested by our full / rolling cluster upgrade tests?
Downgrading an Elasticsearch node to an earlier version is unsupported, because we do not make any attempt to guarantee that a node can read any of the on-disk data written by a future version. Yet today we do not actively prevent downgrades, and sometimes users will attempt to roll back a failed upgrade with an in-place downgrade and get into an unrecoverable state. This change adds the current version of the node to the node metadata file, and checks the version found in this file against the current version at startup. If the node cannot be sure of its ability to read the on-disk data then it refuses to start, preserving any on-disk data in its upgraded state. This change also adds a command-line tool to overwrite the node metadata file without performing any version checks, to unsafely bypass these checks and recover the historical and lenient behaviour.
…der-permit-primary-mode-only * elastic/master: Move the FIPS configuration back to the build plugin (elastic#41989) Remove stray back tick that's messing up table format (elastic#41705) Add missing comma in code section (elastic#41678) add 7.1.1 and 6.8.1 versions (elastic#42253) Use spearate testkit dir for each run (elastic#42013) Add experimental and warnings to vector functions (elastic#42205) Fix version in tests since elastic#41906 was merged Bump version in BWC check after backport Prevent in-place downgrades and invalid upgrades (elastic#41731) Mute date_histo interval bwc test
* master: (176 commits) Avoid unnecessary persistence of retention leases (elastic#42299) [ML][TEST] Fix limits in AutodetectMemoryLimitIT (elastic#42279) [ML Data Frame] Persist and restore checkpoint and position (elastic#41942) mute failing filerealm hash caching tests (elastic#42304) Safer Wait for Snapshot Success in ClusterPrivilegeTests (elastic#40943) Remove 7.0.2 (elastic#42282) Revert "Remove 7.0.2 (elastic#42282)" [DOCS] Copied note on slicing support to Slicing section. Closes 26114 (elastic#40426) Remove 7.0.2 (elastic#42282) Mute all ml_datafeed_crud rolling upgrade tests Move the FIPS configuration back to the build plugin (elastic#41989) Remove stray back tick that's messing up table format (elastic#41705) Add missing comma in code section (elastic#41678) add 7.1.1 and 6.8.1 versions (elastic#42253) Use spearate testkit dir for each run (elastic#42013) Add experimental and warnings to vector functions (elastic#42205) Fix version in tests since elastic#41906 was merged Bump version in BWC check after backport Prevent in-place downgrades and invalid upgrades (elastic#41731) Mute date_histo interval bwc test ...
Downgrading an Elasticsearch node to an earlier version is unsupported, because we do not make any attempt to guarantee that a node can read any of the on-disk data written by a future version. Yet today we do not actively prevent downgrades, and sometimes users will attempt to roll back a failed upgrade with an in-place downgrade and get into an unrecoverable state. This change adds the current version of the node to the node metadata file, and checks the version found in this file against the current version at startup. If the node cannot be sure of its ability to read the on-disk data then it refuses to start, preserving any on-disk data in its upgraded state. This change also adds a command-line tool to overwrite the node metadata file without performing any version checks, to unsafely bypass these checks and recover the historical and lenient behaviour.
Today we fail the node at startup if it contains an index that is too old to be compatible with the current version, unless that index is closed. If the index is closed then the node will start up and this puts us into a bad state: the index cannot be opened and must be reindexed using an earlier version, but we offer no way to get that index into a node running an earlier version so that it can be reindexed. Downgrading the node in-place is decidedly unsupported and cannot be expected to work since the node already started up and upgraded the rest of its metadata. Since elastic#41731 we actively reject downgrades to versions ≥ v7.2.0 too. This commit prevents the node from starting in the presence of any too-old indices (closed or not). In particular, it does not write any upgraded metadata in this situation, increasing the chances an in-place downgrade might be successful. We still actively reject the downgrade using elastic#41731, because we wrote the node metadata file before checking the index metadata, but at least there is a way to override this check. Relates elastic#21830, elastic#44230
Today we fail the node at startup if it contains an index that is too old to be compatible with the current version, unless that index is closed. If the index is closed then the node will start up and this puts us into a bad state: the index cannot be opened and must be reindexed using an earlier version, but we offer no way to get that index into a node running an earlier version so that it can be reindexed. Downgrading the node in-place is decidedly unsupported and cannot be expected to work since the node already started up and upgraded the rest of its metadata. Since #41731 we actively reject downgrades to versions ≥ v7.2.0 too. This commit prevents the node from starting in the presence of any too-old indices (closed or not). In particular, it does not write any upgraded metadata in this situation, increasing the chances an in-place downgrade might be successful. We still actively reject the downgrade using #41731, because we wrote the node metadata file before checking the index metadata, but at least there is a way to override this check. Relates #21830, #44230
Downgrading an Elasticsearch node to an earlier version is unsupported, because
we do not make any attempt to guarantee that a node can read any of the on-disk
data written by a future version. Yet today we do not actively prevent
downgrades, and sometimes users will attempt to roll back a failed upgrade with
an in-place downgrade and get into an unrecoverable state.
This change adds the current version of the node to the node metadata file, and
checks the version found in this file against the current version at startup.
If the node cannot be sure of its ability to read the on-disk data then it
refuses to start, preserving any on-disk data in its upgraded state.
This change also adds a command-line tool to overwrite the node metadata file
without performing any version checks, to unsafely bypass these checks and
recover the historical and lenient behaviour.