Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

client: fix omitted datacenter of a server discovered via Consul #4997

Closed
wants to merge 1 commit into from

Conversation

a-zagaevskiy
Copy link
Contributor

No description provided.

@tantra35
Copy link
Contributor

tantra35 commented Dec 13, 2018

@AlexanderZagaevskiy sorry but dc in this PR doesn't correspond to nomad datacenters. This is a consul datacenter which doesn't necessarily match to nomad. The real issue is that Status.Peers RPC doens't return nomad datacenter, only host.

@a-zagaevskiy
Copy link
Contributor Author

@tantra35 You are right. But maybe it's worth not to check nomad server's dcs in a method func (s *Server) Equal(o *Server) bool ? What do you think ?

Otherwise a servers list discovered via Consul never equals to the one come in a RPC response of a cluster leader (see a method func (c *Client) updateNodeStatus() error). That causes rebalancing and shuffling of the known servers list which (ATTENTION) already has been cycled in c.servers.NotifyFailedServer(server) and as a result we could lost another 10 seconds pinging the same failed server and miss heartbeat for that client.

@tantra35
Copy link
Contributor

tantra35 commented Dec 17, 2018

@AlexanderZagaevskiy I think that this is a bad idea. Because Equal calls happens enough often. Problem that you describe really have place, but for us with patch we got enough solution

As workaround you may combine two methods, to implement third

https://github.com/hashicorp/nomad/blob/master/nomad/status_endpoint.go#L61
to find local servers

and
https://github.com/hashicorp/nomad/blob/master/nomad/status_endpoint.go#L82
this is pesponce on nomad server members cli call

to get datacenter info. Then you may produce third method for example Status.Peers2 with signature as Status.Peers for simplicity

@tantra35
Copy link
Contributor

tantra35 commented Dec 18, 2018

Also when PR #4688 is used, consul discovery called very few times(at initial state and when quorum of servers are broken), so this can't by very big problem. If not that means that you have very big problems with you nomad servers(at least they count must be 3-5), for example they can disappear from each other due very heavy busy or network latency

@nickethier
Copy link
Member

Hey @AlexanderZagaevskiy we believe #4666 to be resolved by #5654

I agree with @tantra35 that while this may fix part of the problem for your case it's not a generalized solution and will break for folks where the consul DC doesn't align with Nomad DC. If you feel theres still an unresolved problem please open an issue so we can discuss. Thanks!

@nickethier nickethier closed this May 14, 2019
@a-zagaevskiy
Copy link
Contributor Author

Thanks! But I have a proposal to exclude DC off the comparison of servers at all. Servers discovered via Consul are always have omitted DC and this becomes a reason for servers list's replacement and for some extra checks|pings while a client's finding an appropriate responding server. So the client's heartbeat check would likely be failed, the client would be considered as failed and it would make currently running task be restarted. Ain't I right?

@schmichael
Copy link
Member

schmichael commented May 20, 2019

@AlexanderZagaevskiy You're right! We're dropping the unused DC field in code for 0.9.2 (see #5731), and I filed #5730 to fix it in the future.

@github-actions
Copy link

github-actions bot commented Feb 9, 2023

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants