-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use file-based discovery not MockUncasedHostsProvider #33554
Use file-based discovery not MockUncasedHostsProvider #33554
Conversation
Today we use a special unicast hosts provider, the `MockUncasedHostsProvider`, in many integration tests, to deal with the dynamic nature of the allocation of ports to nodes. However elastic#33241 allows us to use file-based discovery to achieve the same goal, so the special test-only `MockUncasedHostsProvider` is no longer required. This change removes `MockUncasedHostProvider` and replaces it with file-based discovery in tests based on `EsIntegTestCase`.
Pinging @elastic/es-distributed |
import static org.elasticsearch.transport.TcpTransport.PORT; | ||
|
||
@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.TEST, numDataNodes = 0, numClientNodes = 0) | ||
public class SettingsBasedHostProviderIT extends ESIntegTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we were effectively testing the settings-based host provider via all the other integ tests, but not after this change, I thought it prudent to add this.
logger.info("configuring discovery with {} at {}", discoveryFileContents, configPaths); | ||
for (final Path configPath : configPaths) { | ||
Files.createDirectories(configPath); | ||
Files.write(configPath.resolve(UNICAST_HOSTS_FILE), discoveryFileContents); // TODO do we need to do this atomically? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NB open question: do we need to do this atomically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think so? We never call this concurrently for the same destination file?
Marking this as WIP as it needs more work on BWC tests than I realised. |
@elasticmachine retest this please |
@@ -705,6 +705,8 @@ public Node start() throws NodeValidationException { | |||
assert localNodeFactory.getNode() != null; | |||
assert transportService.getLocalNode().equals(localNodeFactory.getNode()) | |||
: "transportService has a different local node than the factory provided"; | |||
onTransportServiceStarted(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can avoid adding these changes here in Node (and MockNode), but instead after initializing the node (but before starting it), call node.injector().getInstance(TransportService.class)
to get the TransportService
and then register a lifecycle listener on that (addLifecycleListener
) which implements afterStart
.
4b82985
to
924768f
Compare
@dnhatn I had such a failure earlier too, in my exposing CCR to the transport client PR. You can check the build history there. |
@DaveCTurner and @jasontedor Thanks for the ping. I opened #33613 and muted the test. |
BWC test failures were unrelated to this change and cleared up after merging a more recent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits. Looks good otherwise.
@@ -705,6 +705,7 @@ public Node start() throws NodeValidationException { | |||
assert localNodeFactory.getNode() != null; | |||
assert transportService.getLocalNode().equals(localNodeFactory.getNode()) | |||
: "transportService has a different local node than the factory provided"; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@ESIntegTestCase.ClusterScope(scope = ESIntegTestCase.Scope.TEST, numDataNodes = 0, numClientNodes = 0) | ||
public class SettingsBasedHostProviderIT extends ESIntegTestCase { | ||
|
||
private Consumer<Builder> configureDiscovery; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this feels a little hacky to me. Can't you explicitly pass the extra settings to the nodes when you're starting them up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This came about due to a need to remove a setting from the builder, but that's no longer necessary.
logger.info("configuring discovery with {} at {}", discoveryFileContents, configPaths); | ||
for (final Path configPath : configPaths) { | ||
Files.createDirectories(configPath); | ||
Files.write(configPath.resolve(UNICAST_HOSTS_FILE), discoveryFileContents); // TODO do we need to do this atomically? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't think so? We never call this concurrently for the same destination file?
@@ -291,7 +299,7 @@ public void testNodeJoinWithoutSecurityExplicitlyEnabled() throws Exception { | |||
.put("path.home", home) | |||
.put(TestZenDiscovery.USE_MOCK_PINGS.getKey(), false) | |||
.put(DiscoveryModule.DISCOVERY_TYPE_SETTING.getKey(), "test-zen") | |||
.put(DiscoveryModule.DISCOVERY_HOSTS_PROVIDER_SETTING.getKey(), "test-zen") | |||
.putList(DiscoveryModule.DISCOVERY_HOSTS_PROVIDER_SETTING.getKey(), "file") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also just use settings based discovery here (no need to write a file then)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we like file-based discovery. Fixed :(
} | ||
|
||
void countDown() { | ||
logger.info("transport service started: {} of {} remaining", countDownLatch.getCount() - 1, initialCount); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if INFO logging is too verbose here. If start up fails, we should already have failures. So what's the benefit of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This really came about from attempts to track down the need for this:
elasticsearch/test/framework/src/main/java/org/elasticsearch/test/InternalTestCluster.java
Line 599 in 31c1d7d
onTransportServiceStarted.run(); // reusing an existing node implies its transport service already started |
We discussed and decided not to use a countdown at all - see 31c1d7d.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -223,6 +221,8 @@ | |||
private ServiceDisruptionScheme activeDisruptionScheme; | |||
private Function<Client, Client> clientWrapper; | |||
|
|||
private final Object discoveryFileMutex = new Object(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to just move this next to the rebuildUnicastHostFiles
method as it's only used there.
Today we use a special unicast hosts provider, the `MockUncasedHostsProvider`, in many integration tests, to deal with the dynamic nature of the allocation of ports to nodes. However elastic#33241 allows us to use file-based discovery to achieve the same goal, so the special test-only `MockUncasedHostsProvider` is no longer required. This change removes `MockUncasedHostProvider` and replaces it with file-based discovery in tests based on `EsIntegTestCase`.
6.x is not passing CI so I opened a separate PR for the backport: #33658. |
Today we use a special unicast hosts provider, the `MockUncasedHostsProvider`, in many integration tests, to deal with the dynamic nature of the allocation of ports to nodes. However #33241 allows us to use file-based discovery to achieve the same goal, so the special test-only `MockUncasedHostsProvider` is no longer required. This change removes `MockUncasedHostProvider` and replaces it with file-based discovery in tests based on `EsIntegTestCase`. Backport of #33554 to 6.x.
Today when ESIntegTestCase starts some nodes it writes out the unicast hosts files each time a node starts its transport service. This does mean that a number of nodes can start and perform their first pinging round without any unicast hosts which, if the timing is unlucky and a lot of nodes are all started at the same time, can lead to a split brain as in elastic#35052. Prior to elastic#33554 this was unlikely to happen since the MockUncasedHostsProvider would always have yielded the existing hosts, so the timing would have to have been implausibly unlucky. Since elastic#33554, however, it's more likely because the race occurs between the start of the first round of pinging and the writing of the unicast hosts file. It is realistic that new nodes will be configured with the existing nodes from startup, so this change reinstates that behaviour Closes elastic#35052.
Today when ESIntegTestCase starts some nodes it writes out the unicast hosts files each time a node starts its transport service. This does mean that a number of nodes can start and perform their first pinging round without any unicast hosts which, if the timing is unlucky and a lot of nodes are all started at the same time, can lead to a split brain as in #35052. Prior to #33554 this was unlikely to happen since the MockUncasedHostsProvider would always have yielded the existing hosts, so the timing would have to have been implausibly unlucky. Since #33554, however, it's more likely because the race occurs between the start of the first round of pinging and the writing of the unicast hosts file. It is realistic that new nodes will be configured with the existing nodes from startup, so this change reinstates that behaviour. Closes #35052.
Today when ESIntegTestCase starts some nodes it writes out the unicast hosts files each time a node starts its transport service. This does mean that a number of nodes can start and perform their first pinging round without any unicast hosts which, if the timing is unlucky and a lot of nodes are all started at the same time, can lead to a split brain as in #35052. Prior to #33554 this was unlikely to happen since the MockUncasedHostsProvider would always have yielded the existing hosts, so the timing would have to have been implausibly unlucky. Since #33554, however, it's more likely because the race occurs between the start of the first round of pinging and the writing of the unicast hosts file. It is realistic that new nodes will be configured with the existing nodes from startup, so this change reinstates that behaviour. Closes #35052.
Today when ESIntegTestCase starts some nodes it writes out the unicast hosts files each time a node starts its transport service. This does mean that a number of nodes can start and perform their first pinging round without any unicast hosts which, if the timing is unlucky and a lot of nodes are all started at the same time, can lead to a split brain as in #35052. Prior to #33554 this was unlikely to happen since the MockUncasedHostsProvider would always have yielded the existing hosts, so the timing would have to have been implausibly unlucky. Since #33554, however, it's more likely because the race occurs between the start of the first round of pinging and the writing of the unicast hosts file. It is realistic that new nodes will be configured with the existing nodes from startup, so this change reinstates that behaviour. Closes #35052.
Today we use a special unicast hosts provider, the
MockUncasedHostsProvider
,in many integration tests, to deal with the dynamic nature of the allocation of
ports to nodes. However #33241 allows us to use file-based discovery to achieve
the same goal, so the special test-only
MockUncasedHostsProvider
is no longerrequired.
This change removes
MockUncasedHostProvider
and replaces it with file-baseddiscovery in tests based on
EsIntegTestCase
.