Use file-based discovery not MockUncasedHostsProvider #33554

DaveCTurner · 2018-09-10T06:37:44Z

Today we use a special unicast hosts provider, the MockUncasedHostsProvider,
in many integration tests, to deal with the dynamic nature of the allocation of
ports to nodes. However #33241 allows us to use file-based discovery to achieve
the same goal, so the special test-only MockUncasedHostsProvider is no longer
required.

This change removes MockUncasedHostProvider and replaces it with file-based
discovery in tests based on EsIntegTestCase.

Today we use a special unicast hosts provider, the `MockUncasedHostsProvider`, in many integration tests, to deal with the dynamic nature of the allocation of ports to nodes. However elastic#33241 allows us to use file-based discovery to achieve the same goal, so the special test-only `MockUncasedHostsProvider` is no longer required. This change removes `MockUncasedHostProvider` and replaces it with file-based discovery in tests based on `EsIntegTestCase`.

elasticmachine · 2018-09-10T06:37:45Z

Pinging @elastic/es-distributed

DaveCTurner · 2018-09-10T06:38:42Z

server/src/test/java/org/elasticsearch/discovery/zen/SettingsBasedHostProviderIT.java

+import static org.elasticsearch.transport.TcpTransport.PORT;
+
+@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.TEST, numDataNodes = 0, numClientNodes = 0)
+public class SettingsBasedHostProviderIT extends ESIntegTestCase {


Since we were effectively testing the settings-based host provider via all the other integ tests, but not after this change, I thought it prudent to add this.

DaveCTurner · 2018-09-10T06:39:08Z

test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java

+            logger.info("configuring discovery with {} at {}", discoveryFileContents, configPaths);
+            for (final Path configPath : configPaths) {
+                Files.createDirectories(configPath);
+                Files.write(configPath.resolve(UNICAST_HOSTS_FILE), discoveryFileContents); // TODO do we need to do this atomically?


NB open question: do we need to do this atomically?

don't think so? We never call this concurrently for the same destination file?

DaveCTurner · 2018-09-10T14:10:23Z

Marking this as WIP as it needs more work on BWC tests than I realised.

DaveCTurner · 2018-09-11T06:04:49Z

@elasticmachine retest this please
@dnhatn you mentioned you were looking at sporadic failures in GatewayIndexStateIT - this looks like another instance of this.

ywelsch · 2018-09-11T18:57:36Z

server/src/main/java/org/elasticsearch/node/Node.java

@@ -705,6 +705,8 @@ public Node start() throws NodeValidationException {
        assert localNodeFactory.getNode() != null;
        assert transportService.getLocalNode().equals(localNodeFactory.getNode())
            : "transportService has a different local node than the factory provided";
+        onTransportServiceStarted();


I think you can avoid adding these changes here in Node (and MockNode), but instead after initializing the node (but before starting it), call node.injector().getInstance(TransportService.class) to get the TransportService and then register a lifecycle listener on that (addLifecycleListener) which implements afterStart.

jasontedor · 2018-09-11T20:07:46Z

@dnhatn I had such a failure earlier too, in my exposing CCR to the transport client PR. You can check the build history there.

dnhatn · 2018-09-11T20:30:32Z

@DaveCTurner and @jasontedor Thanks for the ping. I opened #33613 and muted the test.

DaveCTurner · 2018-09-12T07:00:27Z

BWC test failures were unrelated to this change and cleared up after merging a more recent master. @ywelsch thanks for the addLifecycleListener suggestion, I've made that change, and this is ready for a review.

ywelsch

A few nits. Looks good otherwise.

ywelsch · 2018-09-12T09:56:38Z

server/src/main/java/org/elasticsearch/node/Node.java

@@ -705,6 +705,7 @@ public Node start() throws NodeValidationException {
        assert localNodeFactory.getNode() != null;
        assert transportService.getLocalNode().equals(localNodeFactory.getNode())
            : "transportService has a different local node than the factory provided";
+


ywelsch · 2018-09-12T10:04:02Z

server/src/test/java/org/elasticsearch/discovery/zen/SettingsBasedHostProviderIT.java

+@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.TEST, numDataNodes = 0, numClientNodes = 0)
+public class SettingsBasedHostProviderIT extends ESIntegTestCase {
+
+    private Consumer<Builder> configureDiscovery;


this feels a little hacky to me. Can't you explicitly pass the extra settings to the nodes when you're starting them up?

This came about due to a need to remove a setting from the builder, but that's no longer necessary.

ywelsch · 2018-09-12T10:35:08Z

test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java

+            logger.info("configuring discovery with {} at {}", discoveryFileContents, configPaths);
+            for (final Path configPath : configPaths) {
+                Files.createDirectories(configPath);
+                Files.write(configPath.resolve(UNICAST_HOSTS_FILE), discoveryFileContents); // TODO do we need to do this atomically?


don't think so? We never call this concurrently for the same destination file?

ywelsch · 2018-09-12T10:38:01Z

x-pack/plugin/security/src/test/java/org/elasticsearch/license/LicensingTests.java

@@ -291,7 +299,7 @@ public void testNodeJoinWithoutSecurityExplicitlyEnabled() throws Exception {
            .put("path.home", home)
            .put(TestZenDiscovery.USE_MOCK_PINGS.getKey(), false)
            .put(DiscoveryModule.DISCOVERY_TYPE_SETTING.getKey(), "test-zen")
-            .put(DiscoveryModule.DISCOVERY_HOSTS_PROVIDER_SETTING.getKey(), "test-zen")
+            .putList(DiscoveryModule.DISCOVERY_HOSTS_PROVIDER_SETTING.getKey(), "file")


you could also just use settings based discovery here (no need to write a file then)

But we like file-based discovery. Fixed :(

ywelsch · 2018-09-12T10:40:18Z

test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java

+        }
+
+        void countDown() {
+            logger.info("transport service started: {} of {} remaining", countDownLatch.getCount() - 1, initialCount);


I wonder if INFO logging is too verbose here. If start up fails, we should already have failures. So what's the benefit of this?

This really came about from attempts to track down the need for this:

elasticsearch/test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java

Line 599 in 31c1d7d

onTransportServiceStarted.run(); // reusing an existing node implies its transport service already started

We discussed and decided not to use a countdown at all - see 31c1d7d.

ywelsch

LGTM

ywelsch · 2018-09-12T12:36:10Z

test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java

@@ -223,6 +221,8 @@
    private ServiceDisruptionScheme activeDisruptionScheme;
    private Function<Client, Client> clientWrapper;

+    private final Object discoveryFileMutex = new Object();


I would prefer to just move this next to the rebuildUnicastHostFiles method as it's only used there.

Today we use a special unicast hosts provider, the `MockUncasedHostsProvider`, in many integration tests, to deal with the dynamic nature of the allocation of ports to nodes. However elastic#33241 allows us to use file-based discovery to achieve the same goal, so the special test-only `MockUncasedHostsProvider` is no longer required. This change removes `MockUncasedHostProvider` and replaces it with file-based discovery in tests based on `EsIntegTestCase`.

DaveCTurner · 2018-09-13T06:09:26Z

6.x is not passing CI so I opened a separate PR for the backport: #33658.

Today we use a special unicast hosts provider, the `MockUncasedHostsProvider`, in many integration tests, to deal with the dynamic nature of the allocation of ports to nodes. However #33241 allows us to use file-based discovery to achieve the same goal, so the special test-only `MockUncasedHostsProvider` is no longer required. This change removes `MockUncasedHostProvider` and replaces it with file-based discovery in tests based on `EsIntegTestCase`. Backport of #33554 to 6.x.

Today when ESIntegTestCase starts some nodes it writes out the unicast hosts files each time a node starts its transport service. This does mean that a number of nodes can start and perform their first pinging round without any unicast hosts which, if the timing is unlucky and a lot of nodes are all started at the same time, can lead to a split brain as in elastic#35052. Prior to elastic#33554 this was unlikely to happen since the MockUncasedHostsProvider would always have yielded the existing hosts, so the timing would have to have been implausibly unlucky. Since elastic#33554, however, it's more likely because the race occurs between the start of the first round of pinging and the writing of the unicast hosts file. It is realistic that new nodes will be configured with the existing nodes from startup, so this change reinstates that behaviour Closes elastic#35052.

Today when ESIntegTestCase starts some nodes it writes out the unicast hosts files each time a node starts its transport service. This does mean that a number of nodes can start and perform their first pinging round without any unicast hosts which, if the timing is unlucky and a lot of nodes are all started at the same time, can lead to a split brain as in #35052. Prior to #33554 this was unlikely to happen since the MockUncasedHostsProvider would always have yielded the existing hosts, so the timing would have to have been implausibly unlucky. Since #33554, however, it's more likely because the race occurs between the start of the first round of pinging and the writing of the unicast hosts file. It is realistic that new nodes will be configured with the existing nodes from startup, so this change reinstates that behaviour. Closes #35052.

DaveCTurner added >non-issue v7.0.0 :Distributed Coordination/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. v6.5.0 labels Sep 10, 2018

DaveCTurner requested a review from ywelsch September 10, 2018 06:37

DaveCTurner commented Sep 10, 2018

View reviewed changes

Imports

721a179

DaveCTurner removed the request for review from ywelsch September 10, 2018 14:09

DaveCTurner added the WIP label Sep 10, 2018

Merge branch 'master' into 2018-09-09-unmock-host-provider

ca8407a

ywelsch reviewed Sep 11, 2018

View reviewed changes

Configure file-based discovery in security test

924768f

DaveCTurner force-pushed the 2018-09-09-unmock-host-provider branch from 4b82985 to 924768f Compare September 11, 2018 19:39

Use LifecycleListener rather than onTransportServiceStarted

4adf965

DaveCTurner requested a review from ywelsch September 12, 2018 07:00

ywelsch suggested changes Sep 12, 2018

View reviewed changes

DaveCTurner added 5 commits September 12, 2018 13:28

Whitespace

82a5329

Just pass the settings directly to the nodes as they are started

edf1feb

TODO resolved (no we don't)

13b6046

No need for a file here

2947a7a

Rebuild the discovery file as each transport service restarts

31c1d7d

ywelsch approved these changes Sep 12, 2018

View reviewed changes

Move field

24a91aa

DaveCTurner merged commit 5a3fd8e into elastic:master Sep 13, 2018

DaveCTurner deleted the 2018-09-09-unmock-host-provider branch September 13, 2018 05:37

DaveCTurner added the backport pending label Sep 13, 2018

DaveCTurner removed the WIP label Sep 13, 2018

DaveCTurner mentioned this pull request Sep 13, 2018

Use file-based discovery not MockUncasedHostsProvider (backport of #33554) #33658

Merged

DaveCTurner removed the backport pending label Sep 13, 2018

DaveCTurner mentioned this pull request Sep 13, 2018

Fix port assignment and discovery in tests #33675

Closed

6 tasks

DaveCTurner mentioned this pull request Oct 31, 2018

Pre-populate unicast hosts files #35136

Merged

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use file-based discovery not MockUncasedHostsProvider #33554

Use file-based discovery not MockUncasedHostsProvider #33554

DaveCTurner commented Sep 10, 2018

elasticmachine commented Sep 10, 2018

DaveCTurner Sep 10, 2018

DaveCTurner Sep 10, 2018

ywelsch Sep 12, 2018

DaveCTurner commented Sep 10, 2018

DaveCTurner commented Sep 11, 2018

ywelsch Sep 11, 2018

jasontedor commented Sep 11, 2018

dnhatn commented Sep 11, 2018

DaveCTurner commented Sep 12, 2018

ywelsch left a comment

ywelsch Sep 12, 2018

DaveCTurner Sep 12, 2018

ywelsch Sep 12, 2018

DaveCTurner Sep 12, 2018

ywelsch Sep 12, 2018

ywelsch Sep 12, 2018

DaveCTurner Sep 12, 2018

ywelsch Sep 12, 2018

DaveCTurner Sep 12, 2018 •

edited

Loading

ywelsch left a comment

ywelsch Sep 12, 2018

DaveCTurner commented Sep 13, 2018 •

edited

Loading

Use file-based discovery not MockUncasedHostsProvider #33554

Use file-based discovery not MockUncasedHostsProvider #33554

Conversation

DaveCTurner commented Sep 10, 2018

elasticmachine commented Sep 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner commented Sep 10, 2018

DaveCTurner commented Sep 11, 2018

Choose a reason for hiding this comment

jasontedor commented Sep 11, 2018

dnhatn commented Sep 11, 2018

DaveCTurner commented Sep 12, 2018

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner Sep 12, 2018 • edited Loading

Choose a reason for hiding this comment

ywelsch left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

DaveCTurner commented Sep 13, 2018 • edited Loading

DaveCTurner Sep 12, 2018 •

edited

Loading

DaveCTurner commented Sep 13, 2018 •

edited

Loading