Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

multiple network interfaces preventing consensus #11981

Closed
RosieBaish opened this issue Feb 2, 2022 · 2 comments
Closed

multiple network interfaces preventing consensus #11981

RosieBaish opened this issue Feb 2, 2022 · 2 comments

Comments

@RosieBaish
Copy link

RosieBaish commented Feb 2, 2022

Nomad version

Nomad v1.2.5 (06d912a)

Operating system and Environment details

Ubuntu 20.04.3 LTS

Issue

Single nomad instance with bootstrap_expect=1 unable to get consensus.
I believe the issue was that I started nomad once with my VPN turned off (and thus an IP in the 192.168 range), then turned my VPN on and restarted it. It then picked up the VPN tunnel address (10.10 range), but was still able to connect to the IP on the 192.168 range. So it believed there were 2 servers rather than 1, and kept transferring the leader between them, and giving it up in the process.
Deleting nomad's data directory fixed the issue.
n.b. I'm aware that the setup I'm using is not a good one for production, it's just the one I have spun up on my laptop for local testing.

Nomad config file

anonymised_nomad_hcl.txt

Reproduction steps

  1. Start Consul from the command line with an empty data directory
  2. Turn off VPN
  3. Start Nomad command line with an empty data directory
  4. Wait for nomad to be running smoothly, I waited for http: request complete: method=GET path=/v1/agent/health?type=server duration=1.29886ms to start appearing in the logs
  5. Stop nomad via Ctrl-C on the command line
  6. Turn on VPN, wait for connection
  7. Start nomad from the command line without changing the data directory
  8. In a different terminal, run nomad status

Expected Result

$nomad status
No running jobs

Actual Result

$nomad status
Error querying jobs: Unexpected response code: 500 (No cluster leader)

Nomad logs

anonymised_nomad_logs.txt

@lgfa29
Copy link
Contributor

lgfa29 commented Feb 2, 2022

Hi @RosieBaish 👋

This is, unfortunately, how Raft v2 works. It uses the node IP as an identifier, so if your IP changes it will be considered a new node in the pool. Switching raft_protocol to 3 would avoid this issue as it generates unique IDs for each node, so even if the IP changes, the ID will remain the same.

v3 will also be the default soon, so if you are starting a new cluster it may be better to start there from the beginning 🙂

Give it a try and see if you still hit this problem. I will close the issue for now, but feel free to add more comments and we can re-open it.

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants