restrict node start-up when cluster name in data path #36519

talevy · 2018-12-12T00:06:34Z

When a 2.x cluster is created, the structure of path.data has all contents for a node inside a directory named after the cluster name. This was changed in 5.x (#18554) to remove the directory with the cluster name and move its contents up a level. A 5.x cluster will still read the 2.x structure correctly. In 6.x this backwards compatible behavior is removed (#20433), and if a 6.x node is started with a data directory using the old 2.x structure, it will see it as if it was empty and ignore the existing data.

This PR makes it so that a 6.x node refuses to start when there exists a data path
with the cluster name in it.

relates: #32661 (comment)

elasticmachine · 2018-12-12T00:06:35Z

Pinging @elastic/es-core-infra

talevy · 2018-12-12T05:48:08Z

cc/ @bleskes

took me a bit of time to get to this only to realize what the tests show, we don't necessarily have permissions to check whether there is a cluster-name in the data path.

I haven't taken another look, so I will do so tomorrow.

bleskes · 2018-12-12T08:58:38Z

Thx @talevy . I'm inclined to implement this as a BootstrapCheck that's always enforced. That said, those are run after the security manager is installed as well. @danielmitterdorfer can you please advise?

danielmitterdorfer · 2018-12-12T13:15:44Z

A boostrap check makes sense to me (although they will only issue a warning when bound to loopback but I think it is ok?). I also think that we are safe with a bootstrap check because we setup the necessary permissions to access the data paths in Security.configureand then invoke the bootstrap checks. As the path in question is in a subdirectory of the data path (to which we have full access) I think we should be fine w.r.t. to access checks?

talevy · 2018-12-12T15:49:32Z

thanks @danielmitterdorfer @bleskes, I will see how to make this a BootstrapCheck

There are certain BootstrapCheck checks that may need access environment-specific values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment via a constructor to bypass the shortcoming in BootstrapContext. This commit pulls in the node's environment into BootstrapContext. Another case is found in elastic#36519, where it is useful to check the state of the data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on the environment to retrieve references to things like node data paths. This means that the BootstrapContext will have the same Settings used in the Environment, which currently differs from the Node's settings.

talevy · 2018-12-12T22:47:55Z

Update: I've learned more about BootstrapChecks and realized there are a few nice refactors to do that will make writing the ClusterNameInDataPathCheck a lot cleaner. Long story short: This check needs access to the node's Environment to work right.

PR that needs to be merged before continuing: #36573
related PR that would be a nice to have: #36574

…ontents for a node inside a directory named after the cluster name. This was changed in 5.x (elastic#18554) to remove the directory with the cluster name and move its contents up a level. A 5.x cluster will still read the 2.x structure correctly. In 6.x this backwards compatible behavior is removed (elastic#20433), and if a 6.x node is started with a data directory using the old 2.x structure, it will see it as if it was empty and ignore the existing data. This PR makes it so that a 6.x node refuses to start when there exists a data path with the cluster name in it. relates: elastic#32661 (comment)

There are certain BootstrapCheck checks that may need access environment-specific values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment via a constructor to bypass the shortcoming in BootstrapContext. This commit pulls in the node's environment into BootstrapContext. Another case is found in #36519, where it is useful to check the state of the data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on the environment to retrieve references to things like node data paths. This means that the BootstrapContext will have the same Settings used in the Environment, which currently differs from the Node's settings.

There are certain BootstrapCheck checks that may need access environment-specific values. Watcher's EncryptSensitiveDataBootstrapCheck passes in the node's environment via a constructor to bypass the shortcoming in BootstrapContext. This commit pulls in the node's environment into BootstrapContext. Another case is found in elastic#36519, where it is useful to check the state of the data-path. Since PathUtils.get and Paths.get are forbidden APIs, we rely on the environment to retrieve references to things like node data paths. This means that the BootstrapContext will have the same Settings used in the Environment, which currently differs from the Node's settings.

talevy · 2018-12-13T06:10:25Z

Update: refactors to give access to Environment#pathFiles have made it in, so now this is ready!

server/src/main/java/org/elasticsearch/bootstrap/BootstrapChecks.java

danielmitterdorfer

Looks fine overall. I left a couple of suggestions / questions about the error message.

server/src/main/java/org/elasticsearch/bootstrap/BootstrapChecks.java

talevy · 2018-12-17T18:02:13Z

example run from a test cluster named elasticsearch

[2018-12-17T10:00:23,572][INFO ][o.e.n.Node               ] [1Xo7d5f] initialized
[2018-12-17T10:00:23,573][INFO ][o.e.n.Node               ] [1Xo7d5f] starting ...
[2018-12-17T10:00:23,796][INFO ][o.e.t.TransportService   ] [1Xo7d5f] publish_address {127.0.0.1:9300}, bound_addresses {[::1]:9300}, {127.0.0.1:9300}

ERROR: [1] bootstrap checks failed

[1]: Cluster name [elasticsearch] subdirectory exists in data paths [/distribution/archives/tar/build/distributions/elasticsearch-6.6.0-SNAPSHOT/data/elasticsearch]. All data under these paths must be moved up one directory to paths [/distribution/archives/tar/build/distributions/elasticsearch-6.6.0-SNAPSHOT/data]

[2018-12-17T10:00:23,850][INFO ][o.e.n.Node               ] [1Xo7d5f] stopping ...
[2018-12-17T10:00:23,865][INFO ][o.e.n.Node               ] [1Xo7d5f] stopped
[2018-12-17T10:00:23,866][INFO ][o.e.n.Node               ] [1Xo7d5f] closing ...
[2018-12-17T10:00:23,924][INFO ][o.e.n.Node               ] [1Xo7d5f] closed
[2018-12-17T10:00:23,928][INFO ][o.e.x.m.p.NativeController] [1Xo7d5f] Native controller process has stopped - no new native processes can be started

talevy · 2018-12-17T21:36:31Z

run the default distro tests

talevy · 2018-12-17T23:27:29Z

After discussion offline with the team, the decision is to bring this check in-line into the Node initialization instead of a Bootstrap Check. The reasoning is that bootstrap checks are, primarily, intended to be checks that can be enabled/disabled depending on the strictness of the environment. This data integrity check is one we want to run always, so it is a candidate for hardcoding the exception into the code

jasontedor

The production code looks good, yet I would structure it a little differently?

jasontedor · 2018-12-20T16:59:38Z

server/src/main/java/org/elasticsearch/node/Node.java

@@ -739,6 +739,20 @@ public Node start() throws NodeValidationException {
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
+
+        final List<Path> existingPathsWithClusterName = Arrays.stream(environment.dataFiles())


I don't think this code should be embedded directly in the node start method. Can you factor this into a dedicated method (e.g., see how we handled something similar in 8033c57). Then it can be that you test this method directly.

for sure, I was 50/50 on doing this, but decided against it due to fear of adding too much overhead. makes sense though, that method is large enough. I'll update

talevy · 2018-12-20T18:27:18Z

thanks for taking a look @jasontedor. I kept the node.start() test because I thought just testing the method would not be enough to check that the method is being called and used by the node's startup execution

jasontedor

LGTM.

talevy · 2019-01-02T18:02:39Z

thanks Jason!

talevy · 2019-01-02T18:03:21Z

and thanks @danielmitterdorfer for initial review and suggestions!

ywelsch · 2019-01-04T14:00:44Z

This PR was missing version and type labels. I've added them based on the commit that was merged. Please adapt if necessary.

talevy added WIP :Core/Infra/Core Core issues without another label labels Dec 12, 2018

talevy force-pushed the cluster-in-path branch 3 times, most recently from e38a3fe to 48adce9 Compare December 12, 2018 17:54

talevy mentioned this pull request Dec 12, 2018

[refactor] add Environment in BootstrapContext #36573

Merged

talevy force-pushed the cluster-in-path branch from 48adce9 to 058cf32 Compare December 13, 2018 01:21

talevy force-pushed the cluster-in-path branch from 058cf32 to 2735bbc Compare December 13, 2018 01:22

Merge remote-tracking branch 'upstream/6.x' into cluster-in-path

17958aa

talevy removed the WIP label Dec 13, 2018

generalize error message

5703909

danielmitterdorfer reviewed Dec 13, 2018

View reviewed changes

server/src/main/java/org/elasticsearch/bootstrap/BootstrapChecks.java Outdated Show resolved Hide resolved

talevy added 2 commits December 13, 2018 10:17

Merge remote-tracking branch 'upstream/6.x' into cluster-in-path

3bb6834

always enforce

9948247

talevy requested a review from danielmitterdorfer December 13, 2018 18:43

Merge remote-tracking branch 'upstream/6.x' into cluster-in-path

184d033

danielmitterdorfer reviewed Dec 14, 2018

View reviewed changes

server/src/main/java/org/elasticsearch/bootstrap/BootstrapChecks.java Outdated Show resolved Hide resolved

talevy added 2 commits December 14, 2018 13:29

Merge remote-tracking branch 'upstream/6.x' into cluster-in-path

9918841

verbose message

04ea871

Merge remote-tracking branch 'upstream/6.x' into cluster-in-path

acf4d20

add docs

990a869

Merge remote-tracking branch 'upstream/6.x' into cluster-in-path

b5e69e9

talevy added 2 commits December 18, 2018 16:20

move check to Node, beside bootstrap validation

1dc97d0

whitespace

2cacec2

jasontedor requested changes Dec 20, 2018

View reviewed changes

talevy added 2 commits December 20, 2018 09:26

Merge remote-tracking branch 'upstream/6.x' into cluster-in-path

671e729

split out method

bd1bb06

talevy requested a review from jasontedor December 20, 2018 18:26

jasontedor approved these changes Dec 23, 2018

View reviewed changes

talevy merged commit 8d36cf3 into elastic:6.x Jan 2, 2019

talevy deleted the cluster-in-path branch January 2, 2019 18:02

talevy mentioned this pull request Jan 2, 2019

handling changes in the structure of path.data when upgrading from 2.x to 5.x to 6.x #32661

Closed

ywelsch added >enhancement v6.7.0 labels Jan 4, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

restrict node start-up when cluster name in data path #36519

restrict node start-up when cluster name in data path #36519

talevy commented Dec 12, 2018

elasticmachine commented Dec 12, 2018

talevy commented Dec 12, 2018 •

edited

Loading

bleskes commented Dec 12, 2018

danielmitterdorfer commented Dec 12, 2018

talevy commented Dec 12, 2018

talevy commented Dec 12, 2018

talevy commented Dec 13, 2018

danielmitterdorfer left a comment

talevy commented Dec 17, 2018

talevy commented Dec 17, 2018

talevy commented Dec 17, 2018

jasontedor left a comment

jasontedor Dec 20, 2018

talevy Dec 20, 2018

talevy commented Dec 20, 2018

jasontedor left a comment

talevy commented Jan 2, 2019

talevy commented Jan 2, 2019

ywelsch commented Jan 4, 2019

restrict node start-up when cluster name in data path #36519

restrict node start-up when cluster name in data path #36519

Conversation

talevy commented Dec 12, 2018

elasticmachine commented Dec 12, 2018

talevy commented Dec 12, 2018 • edited Loading

bleskes commented Dec 12, 2018

danielmitterdorfer commented Dec 12, 2018

talevy commented Dec 12, 2018

talevy commented Dec 12, 2018

talevy commented Dec 13, 2018

danielmitterdorfer left a comment

Choose a reason for hiding this comment

talevy commented Dec 17, 2018

talevy commented Dec 17, 2018

talevy commented Dec 17, 2018

jasontedor left a comment

Choose a reason for hiding this comment

jasontedor Dec 20, 2018

Choose a reason for hiding this comment

talevy Dec 20, 2018

Choose a reason for hiding this comment

talevy commented Dec 20, 2018

jasontedor left a comment

Choose a reason for hiding this comment

talevy commented Jan 2, 2019

talevy commented Jan 2, 2019

ywelsch commented Jan 4, 2019

talevy commented Dec 12, 2018 •

edited

Loading