-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
restrict node start-up when cluster name in data path #36519
Conversation
Pinging @elastic/es-core-infra |
cc/ @bleskes took me a bit of time to get to this only to realize what the tests show, we don't necessarily have permissions to check whether there is a cluster-name in the data path. I haven't taken another look, so I will do so tomorrow. |
Thx @talevy . I'm inclined to implement this as a |
A boostrap check makes sense to me (although they will only issue a warning when bound to loopback but I think it is ok?). I also think that we are safe with a bootstrap check because we setup the necessary permissions to access the data paths in |
thanks @danielmitterdorfer @bleskes, I will see how to make this a BootstrapCheck |
e38a3fe
to
48adce9
Compare
There are certain BootstrapCheck checks that may need access environment-specific values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment via a constructor to bypass the shortcoming in BootstrapContext. This commit pulls in the node's environment into BootstrapContext. Another case is found in elastic#36519, where it is useful to check the state of the data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on the environment to retrieve references to things like node data paths. This means that the BootstrapContext will have the same Settings used in the Environment, which currently differs from the Node's settings.
Update: I've learned more about BootstrapChecks and realized there are a few nice refactors to do that will make writing the PR that needs to be merged before continuing: #36573 |
48adce9
to
058cf32
Compare
…ontents for a node inside a directory named after the cluster name. This was changed in 5.x (elastic#18554) to remove the directory with the cluster name and move its contents up a level. A 5.x cluster will still read the 2.x structure correctly. In 6.x this backwards compatible behavior is removed (elastic#20433), and if a 6.x node is started with a data directory using the old 2.x structure, it will see it as if it was empty and ignore the existing data. This PR makes it so that a 6.x node refuses to start when there exists a data path with the cluster name in it. relates: elastic#32661 (comment)
058cf32
to
2735bbc
Compare
There are certain BootstrapCheck checks that may need access environment-specific values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment via a constructor to bypass the shortcoming in BootstrapContext. This commit pulls in the node's environment into BootstrapContext. Another case is found in #36519, where it is useful to check the state of the data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on the environment to retrieve references to things like node data paths. This means that the BootstrapContext will have the same Settings used in the Environment, which currently differs from the Node's settings.
There are certain BootstrapCheck checks that may need access environment-specific values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment via a constructor to bypass the shortcoming in BootstrapContext. This commit pulls in the node's environment into BootstrapContext. Another case is found in elastic#36519, where it is useful to check the state of the data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on the environment to retrieve references to things like node data paths. This means that the BootstrapContext will have the same Settings used in the Environment, which currently differs from the Node's settings.
Update: refactors to give access to Environment#pathFiles have made it in, so now this is ready! |
server/src/main/java/org/elasticsearch/bootstrap/BootstrapChecks.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine overall. I left a couple of suggestions / questions about the error message.
server/src/main/java/org/elasticsearch/bootstrap/BootstrapChecks.java
Outdated
Show resolved
Hide resolved
example run from a test cluster named
|
run the default distro tests |
After discussion offline with the team, the decision is to bring this check in-line into the Node initialization instead of a Bootstrap Check. The reasoning is that bootstrap checks are, primarily, intended to be checks that can be enabled/disabled depending on the strictness of the environment. This data integrity check is one we want to run always, so it is a candidate for hardcoding the exception into the code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The production code looks good, yet I would structure it a little differently?
@@ -739,6 +739,20 @@ public Node start() throws NodeValidationException { | |||
} catch (IOException e) { | |||
throw new UncheckedIOException(e); | |||
} | |||
|
|||
final List<Path> existingPathsWithClusterName = Arrays.stream(environment.dataFiles()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this code should be embedded directly in the node start method. Can you factor this into a dedicated method (e.g., see how we handled something similar in 8033c57). Then it can be that you test this method directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for sure, I was 50/50 on doing this, but decided against it due to fear of adding too much overhead. makes sense though, that method is large enough. I'll update
thanks for taking a look @jasontedor. I kept the node.start() test because I thought just testing the method would not be enough to check that the method is being called and used by the node's startup execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
thanks Jason! |
and thanks @danielmitterdorfer for initial review and suggestions! |
This PR was missing version and type labels. I've added them based on the commit that was merged. Please adapt if necessary. |
When a 2.x cluster is created, the structure of
path.data
has all contents for a node inside a directory named after the cluster name. This was changed in 5.x (#18554) to remove the directory with the cluster name and move its contents up a level. A 5.x cluster will still read the 2.x structure correctly. In 6.x this backwards compatible behavior is removed (#20433), and if a 6.x node is started with a data directory using the old 2.x structure, it will see it as if it was empty and ignore the existing data.This PR makes it so that a 6.x node refuses to start when there exists a data path
with the cluster name in it.
relates: #32661 (comment)