multiple network interfaces preventing consensus #11981

RosieBaish · 2022-02-02T15:36:05Z

Nomad version

Nomad v1.2.5 (06d912a)

Operating system and Environment details

Ubuntu 20.04.3 LTS

Issue

Single nomad instance with bootstrap_expect=1 unable to get consensus.
I believe the issue was that I started nomad once with my VPN turned off (and thus an IP in the 192.168 range), then turned my VPN on and restarted it. It then picked up the VPN tunnel address (10.10 range), but was still able to connect to the IP on the 192.168 range. So it believed there were 2 servers rather than 1, and kept transferring the leader between them, and giving it up in the process.
Deleting nomad's data directory fixed the issue.
n.b. I'm aware that the setup I'm using is not a good one for production, it's just the one I have spun up on my laptop for local testing.

Nomad config file

anonymised_nomad_hcl.txt

Reproduction steps

Start Consul from the command line with an empty data directory
Turn off VPN
Start Nomad command line with an empty data directory
Wait for nomad to be running smoothly, I waited for http: request complete: method=GET path=/v1/agent/health?type=server duration=1.29886ms to start appearing in the logs
Stop nomad via Ctrl-C on the command line
Turn on VPN, wait for connection
Start nomad from the command line without changing the data directory
In a different terminal, run nomad status

Expected Result

$nomad status
No running jobs

Actual Result

$nomad status
Error querying jobs: Unexpected response code: 500 (No cluster leader)

Nomad logs

anonymised_nomad_logs.txt

The text was updated successfully, but these errors were encountered:

lgfa29 · 2022-02-02T21:32:58Z

Hi @RosieBaish 👋

This is, unfortunately, how Raft v2 works. It uses the node IP as an identifier, so if your IP changes it will be considered a new node in the pool. Switching raft_protocol to 3 would avoid this issue as it generates unique IDs for each node, so even if the IP changes, the ID will remain the same.

v3 will also be the default soon, so if you are starting a new cluster it may be better to start there from the beginning 🙂

Give it a try and see if you still hit this problem. I will close the issue for now, but feel free to add more comments and we can re-open it.

github-actions · 2022-10-12T02:44:11Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

RosieBaish added the type/bug label Feb 2, 2022

lgfa29 closed this as completed Feb 2, 2022

lgfa29 added stage/not-a-bug theme/raft labels Feb 2, 2022

github-actions bot locked as resolved and limited conversation to collaborators Oct 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multiple network interfaces preventing consensus #11981

multiple network interfaces preventing consensus #11981

RosieBaish commented Feb 2, 2022 •

edited

Loading

lgfa29 commented Feb 2, 2022

github-actions bot commented Oct 12, 2022

multiple network interfaces preventing consensus #11981

multiple network interfaces preventing consensus #11981

Comments

RosieBaish commented Feb 2, 2022 • edited Loading

Nomad version

Operating system and Environment details

Issue

Nomad config file

Reproduction steps

Expected Result

Actual Result

Nomad logs

lgfa29 commented Feb 2, 2022

github-actions bot commented Oct 12, 2022

RosieBaish commented Feb 2, 2022 •

edited

Loading