Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS #32764

Merged
merged 16 commits into from
Aug 18, 2018

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Aug 10, 2018

  • Lazy resolve DNS (i.e. String to DiscoveryNode) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html)
    • Changed InetAddress type to String for that higher up the stack
    • Passed down Supplier<DiscoveryNode> instead of outright DiscoveryNode from RemoteClusterAware#buildRemoteClustersSeeds on to lazy resolve DNS when the DiscoveryNode is actually used (could've also passed down the value of clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting) together with the List<String> of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset).
  • Closes Remote cluster DNS cached indefinitely #28858

@original-brownbear original-brownbear added >bug :Distributed Coordination/Network Http and internode communication implementations v7.0.0 v6.5.0 labels Aug 10, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra

@@ -48,8 +49,12 @@
/**
* A list of initial seed nodes to discover eligible nodes from the remote cluster
*/
public static final Setting.AffixSetting<List<InetSocketAddress>> REMOTE_CLUSTERS_SEEDS = Setting.affixKeySetting("search.remote.",
"seeds", (key) -> Setting.listSetting(key, Collections.emptyList(), RemoteClusterAware::parseSeedAddress,
public static final Setting.AffixSetting<List<String>> REMOTE_CLUSTERS_SEEDS = Setting.affixKeySetting("search.remote.",
Copy link
Contributor

@chernecov chernecov Aug 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am stranger here, so sorry if I am wrong, but it would be more readable:

public static final Setting.AffixSetting<List<String>> REMOTE_CLUSTERS_SEEDS = Setting.affixKeySetting(
    "search.remote.",
    "seeds",
    (key) -> Setting.listSetting(
        key,
        Collections.emptyList(),
        s -> {
            // validate seed address
            RemoteClusterAware.parseSeedAddress(s);
            return s;
        },
        Setting.Property.NodeScope,
        Setting.Property.Dynamic
    )
);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries, I don't think there's much of a wrong or right here :) I tend to agree with your suggested style but if you look around the codebase you will find that my style is what's used pretty much everywhere so I went with the consistent approach here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh, I see now.
That's completely fine then.
Consistency is a very good thing, your're right.
Thanks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@original-brownbear I use the style suggested by @chernecov. 😇

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hehe that's all I needed to hear :) Will reformat

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I skimmed this and the general approach looks right. I think what is missing though, sorry if I overlooked, is a test that shows that we are now resolve addresses on demand? That is, that at least we (independent of what the JVM might be doing), no longer cache addresses.

@original-brownbear
Copy link
Member Author

@jasontedor thanks for taking a look :)

You didn't overlook anything, I was a bad one for not adding a test => added one now though :)
=> This should be good for another look.

@original-brownbear
Copy link
Member Author

hold off a few minutes with the review, seems the new test is unstable in some way -> fixing

@original-brownbear
Copy link
Member Author

@jasontedor should be stable/good to review now. Sorry for the noise :)

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a concern about how we validate the seed node setting now.

key, Collections.emptyList(),
s -> {
// validate seed address
RemoteClusterAware.parseSeedAddress(s);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this seems problematic. It means, for example, if a host fails to resolve, we mark the setting as invalid. Yet, it could resolve for us later?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasontedor yea, I guess we could leave it out. I mainly just added this because there was an explicit test for this behavior.
I guess we could just delete it and drop the validation. I could go either way here:
Yes this could resolve later, so maybe it's fine not to validate. On the other hand, you could make an annoying typo in a hostname and now you won't catch it right away. Probably a question of how often and if someone would ever configure a hostname that doesn't resolve yet. Also, if you don't have the DNS caching ttl configured, not having this check is kind of annoying because it will never resolve and you might rather want this to fail right away
=> You decide :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a hostname fails to resolve at time of use (i.e., when we try to connect to the remote cluster), we would have a failure message in the logs indicating the problem? That is, the unknown host exception would be there?

I lean towards saying we leave the lookup out of validation, yet I think the remainder of the validation is good.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasontedor sounds good will adjust accordingly in a bit :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jasontedor done in a22994a, extracted the port parsing from the address parsing to get all validation except for the hostname lookup into a separate method :)

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I left a few comments, but no need for another round.

if (port <= 0) {
throw new IllegalArgumentException("port number must be > 0 but was: [" + port + "]");
}
return new InetSocketAddress(hostAddress, port);
return port;
} catch (NumberFormatException e) {
throw new IllegalArgumentException("port must be a number", e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this error message can be misleading. For example, Integer.valueOf will fail to parse 123456789123456789 with a NumberFormatException yet 123456789123456789 is a number! I am okay with catching NumberFormatException and re-throwing since we can add context that NumberFormatException does not provide (For input string: "123456789123456789" is not helpful, specifying that this occurred while parsing a port is) but I think a more correct error message would be helpful. For example, failed to parse port.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I do not see a test for this case.

transportAddress,
Version.CURRENT.minimumCompatibilityVersion());
nodes.add(node);
List<Supplier<DiscoveryNode>> nodes = new ArrayList<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can pre-size the list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure will do :)

Copy link
Member

@jasontedor jasontedor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, one more nit.

parsePort(s);
return s;
},
Setting.Property.NodeScope, Setting.Property.Dynamic
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could probably be on a separate line each, too. 😇

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure will do :)

@original-brownbear
Copy link
Member Author

@jasontedor thanks addressed all points => will merge once green :)

@original-brownbear
Copy link
Member Author

Jenkins test this

@original-brownbear original-brownbear merged commit f82bb64 into elastic:master Aug 18, 2018
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Aug 18, 2018
* master:
  NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (elastic#32764)
  [DOCS] Splits the users API documentation into multiple pages (elastic#32825)
  [DOCS] Splits the token APIs into separate pages (elastic#32865)
  [DOCS] Creates redirects for role management APIs page
  Bypassing failing test PainlessDomainSplitIT#testHRDSplit (elastic#32966)
  TEST: Mute testRetentionPolicyChangeDuringRecovery
  [DOCS] Fixes more broken links to role management APIs
  [Docs] Tweaks and fixes to rollup docs
  [DOCS] Fixes links to role management APIs
  [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature
  [DOCS] Splits the roles API documentation into multiple pages (elastic#32794)
  [TEST]  Run pre 6.4 nodes in non-FIPS JVMs (elastic#32901)
  Make Geo Context Mapping Parsing More Strict (elastic#32821)
jasontedor added a commit that referenced this pull request Aug 18, 2018
* elastic/master: (46 commits)
  NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (#32764)
  [DOCS] Splits the users API documentation into multiple pages (#32825)
  [DOCS] Splits the token APIs into separate pages (#32865)
  [DOCS] Creates redirects for role management APIs page
  Bypassing failing test PainlessDomainSplitIT#testHRDSplit (#32966)
  TEST: Mute testRetentionPolicyChangeDuringRecovery
  [DOCS] Fixes more broken links to role management APIs
  [Docs] Tweaks and fixes to rollup docs
  [DOCS] Fixes links to role management APIs
  [ML][TEST] Fix BasicRenormalizationIT after adding multibucket feature
  [DOCS] Splits the roles API documentation into multiple pages (#32794)
  [TEST]  Run pre 6.4 nodes in non-FIPS JVMs (#32901)
  Make Geo Context Mapping Parsing More Strict (#32821)
  [ML] fix updating opened jobs scheduled events (#31651) (#32881)
  Scripted metric aggregations: add deprecation warning and system property to control legacy params (#31597)
  Tests: Fix timezone conversion in DateTimeUnitTests
  Enable FIPS140LicenseBootstrapCheck (#32903)
  Fix InternalAutoDateHistogram reproducible failure (#32723)
  Remove assertion in testDocStats on deletedDocs counter (#32914)
  HLRC: Move ML request converters into their own class (#32906)
  ...
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Aug 19, 2018
* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html)
   * Changed `InetAddress` type to `String` for that higher up the stack
   * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset).
* Closes elastic#28858
original-brownbear added a commit that referenced this pull request Aug 19, 2018
* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html)
   * Changed `InetAddress` type to `String` for that higher up the stack
   * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset).
* Closes #28858
jasontedor pushed a commit that referenced this pull request Aug 21, 2018
* Lazy resolve DNS (i.e. `String` to `DiscoveryNode`) to not run into indefinitely caching lookup issues (provided the JVM dns cache is configured correctly as explained in https://www.elastic.co/guide/en/elasticsearch/reference/6.3/networkaddress-cache-ttl.html)
   * Changed `InetAddress` type to `String` for that higher up the stack
   * Passed down `Supplier<DiscoveryNode>` instead of outright `DiscoveryNode` from `RemoteClusterAware#buildRemoteClustersSeeds` on to lazy resolve DNS when the `DiscoveryNode` is actually used (could've also passed down the value of `clusterName = REMOTE_CLUSTERS_SEEDS.getNamespace(concreteSetting)` together with the `List<String>` of hosts, but this route seemed to introduce less duplication and resulted in a significantly smaller changeset).
* Closes #28858
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Aug 24, 2018
* commit '9088d811ce9cff922e6ef1befbeb0f1e0c27016a': (23 commits)
  Generalize remote license checker (elastic#32971)
  Trim translog when safe commit advanced (elastic#32967)
  Fix an inaccuracy in the dynamic templates documentation. (elastic#32890)
  HLREST: AwaitsFix ML Test
  Make Geo Context Mapping Parsing More Strict (elastic#32862)
  Add mzn and dz to unsupported locales (elastic#32957)
  Use settings from the context in BootstrapChecks (elastic#32908)
  Update docs for node specifications (elastic#30468)
  HLRC: Forbid all Elasticsearch logging infra (elastic#32784)
  Fix use of deprecated apis
  Only configure publishing if it's applied externally (elastic#32351)
  Protect ScriptedMetricIT test cases against failures on 0-doc shards (elastic#32959) (elastic#32968)
  Scripted metric aggregations: add deprecation warning and system (elastic#32944)
  Watcher: Properly find next valid date in cron expressions (elastic#32734)
  Fix some small issues in the getting started docs (elastic#30346)
  Set forbidden APIs target compatibility to compiler java version   (elastic#32935)
  [TEST] Add "ne" as an unsupported SimpleKdc locale (elastic#32700)
  MINOR: Remove `IndexTemplateFilter` (elastic#32841) (elastic#32974)
  INGEST: Create Index Before Pipeline Execute (elastic#32786) (elastic#32975)
  NETWORKING: Make RemoteClusterConn. Lazy Resolve DNS (elastic#32764) (elastic#32976)
  ...
@jimczi jimczi added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Network Http and internode communication implementations v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remote cluster DNS cached indefinitely
5 participants