Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port conflict on client with high network utilization #1728

Closed
jshaw86 opened this issue Sep 21, 2016 · 35 comments
Closed

Port conflict on client with high network utilization #1728

jshaw86 opened this issue Sep 21, 2016 · 35 comments

Comments

@jshaw86
Copy link

jshaw86 commented Sep 21, 2016

Nomad version

Nomad 0.4.1

Operating system and Environment details

Ubuntu 14.04

Issue

At a certain allocation/network request load we get a failure to bind on the specified NOMAD_ADDR port.

Reproduction steps

reserved_ports_issue.zip
@dadgar I've attached a minmal example. When I hit 60 allocations on my vagrant I hit the issue. I've attached the .nomad file and http_server.js. NOTE: you may need to tweak the region and dc's to match your vagrant.

  1. Bring up vagrant with at least 4GB of Memory and 2 CPUS
  2. mv http_server.js /usr/lib/reserved_ports
  3. install node
  4. nomad run reserved_ports.nomad

As a side note I did 3 separate runs increasing the Count each time without tearing down the old allocations.

First run: Count = 25
Second run: Count = 40
Third Run: Count = 60

Nomad Client logs (if appropriate)

Below is the results from nomad status reservedports and nomad fs and nomad alloc-status

ID          = reservedports
Name        = reservedports
Type        = service
Priority    = 50
Datacenters = dev-vagrant-blu-1
Status      = running
Periodic    = false

ESC[1mSummaryESC[0mESC[0m
Task Group     Queued  Starting  Running  Failed  Complete  Lost
reservedports  0       0         59       1       0         0

ESC[1mAllocationsESC[0mESC[0m
ID        Eval ID   Node ID   Task Group     Desired  Status   Created At
42455534  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:52:12 PDT
c9ad80f5  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:52:12 PDT
bebab092  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:52:12 PDT
92acd6d8  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:52:12 PDT
82e7f17e  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:52:12 PDT
72c54c3a  8c48311c  7f3f9ab9  reservedports  run      failed   09/20/16 17:52:12 PDT
239e94ac  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:52:12 PDT
0525a315  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:52:12 PDT
2b450d21  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:52:12 PDT
41157bc5  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:52:12 PDT
1b325f6e  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:52 PDT
f92b367c  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:52 PDT
33724e53  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:52 PDT
d1c2e5c2  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:52 PDT
c51794ec  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:52 PDT
117dfe76  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:52 PDT
2e9a6744  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:52 PDT
aedff0fd  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:52 PDT
8768e29e  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:52 PDT
5d2df93a  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:52 PDT
42899d96  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:01 PDT
6323b9e5  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:01 PDT
6c072f74  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:01 PDT
21bb8e79  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:01 PDT
62659cb0  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:01 PDT
5a19abe4  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:01 PDT
895109a7  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:01 PDT
8c14bfe1  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:01 PDT
91b4e157  8c48311c  7f3f9ab9  reservedports  run      running  09/20/16 17:50:01 PDT
...
vagrant@blocks1:/usr/lib/reserved_ports$ nomad fs 72c54c3a alloc/logs/reservedports.stderr.0
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE :::57967
    at Object.exports._errnoException (util.js:873:11)
    at exports._exceptionWithHostPort (util.js:896:20)
    at Server._listen2 (net.js:1250:14)
    at listen (net.js:1286:10)
    at Server.listen (net.js:1382:5)
    at Object.<anonymous> (/usr/lib/reserved_ports/http_server.js:66:8)
    at Module._compile (module.js:409:26)
    at Object.Module._extensions..js (module.js:416:10)
    at Module.load (module.js:343:32)
    at Function.Module._load (module.js:300:12)
vagrant@blocks1:/usr/lib/reserved_ports$ nomad alloc-status 72c54c3a
ID            = 72c54c3a
Eval ID       = 8c48311c
Name          = reservedports.reservedports[58]
Node ID       = 7f3f9ab9
Job ID        = reservedports
Client Status = failed

Task "reservedports" is "dead"
Task Resources
CPU     Memory  Disk     IOPS  Addresses
20 MHz  30 MiB  300 MiB  0     http: 127.0.0.1:57967

Recent Events:
Time                   Type            Description
09/20/16 17:52:39 PDT  Not Restarting  Error was unrecoverable
09/20/16 17:52:39 PDT  Driver Failure  failed to start task 'reservedports' for alloc '72c54c3a-7362-4a6e-5a9e-2ebbfc28307b': Couldn't create destination file /opt/nomad/data/alloc/72c54c3a-7362-4a6e-5a9e-2ebbfc28307b/reservedports/usr/bin/node: open /opt/nomad/data/alloc/72c54c3a-7362-4a6e-5a9e-2ebbfc28307b/reservedports/usr/bin/node: text file busy
09/20/16 17:52:21 PDT  Restarting      Task restarting in 18.027863591s
09/20/16 17:52:21 PDT  Terminated      Exit Code: 1
09/20/16 17:52:21 PDT  Started         Task started by client
09/20/16 17:52:13 PDT  Received        Task received by client

Job file (if appropriate)

see attached

@jshaw86
Copy link
Author

jshaw86 commented Sep 27, 2016

Hey @dadgar was wondering how this one was going, let me know if you need anything more from me

@dadgar
Copy link
Contributor

dadgar commented Sep 27, 2016

Sorry have not yet had a chance to dig into it, hopefully soon

@dadgar dadgar added this to the v0.5.0 milestone Oct 3, 2016
@dadgar
Copy link
Contributor

dadgar commented Oct 3, 2016

Hey @jshaw86,

Just an update. I tried to reproduce this on both 0.4.1 and master with what you gave me. I went from count = 1 to count = 100, back and forth about 8 times and never hit this.

@jshaw86
Copy link
Author

jshaw86 commented Oct 3, 2016

@dadgar hmmm and you used the example I gave? Because you have to be doing some outbound tcp/http calls to get it.

I will try to spin up the nomad vagrant and reproduce since it's been a week. I'm not sure what to tell you other than we get it a lot and very reproducible.

@dadgar
Copy link
Contributor

dadgar commented Oct 3, 2016

@jshaw86 Yeah the steps I did was:

  1. cd /nomad
  2. vagrant up
  3. Download, unzip your attached archive
  4. Install nodejs 6.7
  5. move the http_server.js to /usr/lib/reserved_ports

Then I tried with exec as you had it but it was a bit slow to bring them up and down, so then I also tried with raw_exec

@jshaw86
Copy link
Author

jshaw86 commented Oct 3, 2016

@dadgar another engineer just hit it in our staging:

ubuntu@nomad-sched2:~$ nomad alloc-status 5014d47f
ID            = 5014d47f
Eval ID       = 72157fe3
Name          = block-4818.event_handler_4903_1_pdx3[0]
Node ID       = 0c5a6638
Job ID        = block-4818
Client Status = failed

Task "event_handler_4903_1_pdx3" is "dead"
Task Resources
CPU     Memory   Disk     IOPS  Addresses
30 MHz  100 MiB  100 MiB  0     http: 172.16.102.60:41354

Recent Events:
Time                   Type            Description
10/03/16 23:47:57 UTC  Not Restarting  Error was unrecoverable
10/03/16 23:47:57 UTC  Driver Failure  failed to start task 'event_handler_4903_1_pdx3' for alloc '5014d47f-b1b1-6c8a-9d7c-58f5ba743c6b': Couldn't create destination file /opt/nomad/mnt/5014d47f-b1b1-6c8a-9d7c-58f5ba743c6b/event_handler_4903_1_pdx3/usr/bin/blocks-usage: open /opt/nomad/mnt/5014d47f-b1b1-6c8a-9d7c-58f5ba743c6b/event_handler_4903_1_pdx3/usr/bin/blocks-usage: text file busy
10/03/16 23:47:51 UTC  Restarting      Task restarting in 6.087348763s
10/03/16 23:47:51 UTC  Terminated      Exit Code: 1
10/03/16 23:47:50 UTC  Started         Task started by client
10/03/16 23:47:49 UTC  Received        Task received by client
ubuntu@nomad-sched2:~$ nomad fs 5014d47f alloc/logs
Mode        Size   Modified Time          Name
-rw-r--r--  650 B  10/03/16 23:47:51 UTC  event_handler_4903_1_pdx3.stderr.0
-rw-r--r--  0 B    10/03/16 23:47:50 UTC  event_handler_4903_1_pdx3.stdout.0
ubuntu@nomad-sched2:~$ nomad fs 5014d47f alloc/logs/event_handler_4903_1_pdx3.stderr.0
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE :::41354
    at Object.exports._errnoException (util.js:907:11)
    at exports._exceptionWithHostPort (util.js:930:20)
    at Server._listen2 (net.js:1250:14)
    at Server._listen2 (/usr/lib/pn-blocks/src/node_modules/continuation-local-storage/node_modules/async-listener/index.js:51:23)
    at listen (net.js:1286:10)
    at Server.listen (net.js:1382:5)
    at Object.<anonymous> (/usr/lib/pn-blocks/src/main.js:346:8)
    at Module._compile (module.js:409:26)
    at Object.Module._extensions..js (module.js:416:10)
    at Module.load (module.js:343:32)

@jshaw86
Copy link
Author

jshaw86 commented Oct 3, 2016

@dadgar did you get up to instance count = 60 with exec before switching to raw_exec?

@dadgar
Copy link
Contributor

dadgar commented Oct 3, 2016

@jshaw86 Yeah I was up to 100. If that has just happened can you use lsof to see what is attached to that port?

If not maybe a dump of all the allocations. I want to know if nomad double assigned that address

@jshaw86
Copy link
Author

jshaw86 commented Oct 4, 2016

@dadgar nothing listed on lsof for that port, but we were about 5 minutes late, so whatever had it went away.

@dadgar
Copy link
Contributor

dadgar commented Oct 4, 2016

@jshaw86 Can you send me this?

curl http://127.0.0.1:4646/v1/allocations | jq .[].ID | xargs -n1 -I {} curl http://127.0.0.1:4646/v1/allocation/{} > dump

If it is sensitive, you have my email

@jshaw86
Copy link
Author

jshaw86 commented Oct 4, 2016

@dadgar yea, i'm trying to repo on nomad vagrant for you, will try to get a lsof, and that for you

@dadgar
Copy link
Contributor

dadgar commented Oct 4, 2016

Awesome! Thanks! Want to resolve this for you guys so you can just cherry
pick the commit in the mean time.
On Mon, Oct 3, 2016 at 5:38 PM jshaw86 notifications@github.com wrote:

@dadgar https://github.com/dadgar yea, i'm trying to repo on nomad
vagrant for you, will try to get a lsof, and that for you


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#1728 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AA_9aksXqBSdKx5mGxj05hX5hBT_sECUks5qwaATgaJpZM4KDHRf
.

@jshaw86
Copy link
Author

jshaw86 commented Oct 4, 2016

@dadgar Hit it pretty easily Count = 60, lsof was empty for the port 43409. As a aside some of the containers were having issues resolving httpbin and failed with ENOTFOUND which shouldn't be related and maybe just something with our office network right now, but if you see it ignore it or just do a dig on httpbin and replace the DNS with the ip in the code. I'll send a new zip in a min, but basically the same as before.

I've attached the curl to allocations here:
dump.txt

Here's my alloc status and fs

vagrant@nomad-server01:/opt/gopath/src/github.com/hashicorp/nomad$ nomad alloc-status 02001faf
ID            = 02001faf
Eval ID       = b3951268
Name          = reservedports.reservedports[47]
Node ID       = ee231055
Job ID        = reservedports
Client Status = failed

Task "reservedports" is "dead"
Task Resources
CPU     Memory  Disk     IOPS  Addresses
20 MHz  30 MiB  300 MiB  0     http: 127.0.0.1:43409

Recent Events:
Time                   Type            Description
10/04/16 00:54:33 UTC  Not Restarting  Error was unrecoverable
10/04/16 00:54:33 UTC  Driver Failure  failed to start task 'reservedports' for alloc '02001faf-eb76-64ca-2afa-6049dd09034b': Couldn't create destination file /opt/nomad/data/alloc/02001faf-eb76-64ca-2afa-6049dd09034b/reservedports/usr/bin/node: open /opt/nomad/data/alloc/02001faf-eb76-64ca-2afa-6049dd09034b/reservedports/usr/bin/node: text file busy
10/04/16 00:54:10 UTC  Restarting      Task restarting in 16.831422678s
10/04/16 00:54:10 UTC  Terminated      Exit Code: 1
10/04/16 00:54:10 UTC  Started         Task started by client
10/04/16 00:53:32 UTC  Received        Task received by client
vagrant@nomad-server01:/opt/gopath/src/github.com/hashicorp/nomad$ nomad fs 02001faf alloc/logs/reservedports.stderr.0
events.js:141
      throw er; // Unhandled 'error' event
      ^

Error: listen EADDRINUSE :::43409
    at Object.exports._errnoException (util.js:907:11)
    at exports._exceptionWithHostPort (util.js:930:20)
    at Server._listen2 (net.js:1250:14)
    at listen (net.js:1286:10)
    at Server.listen (net.js:1382:5)
    at Object.<anonymous> (/usr/lib/reserved_ports/http_server.js:66:8)
    at Module._compile (module.js:409:26)
    at Object.Module._extensions..js (module.js:416:10)
    at Module.load (module.js:343:32)
    at Function.Module._load (module.js:300:12)

@jshaw86
Copy link
Author

jshaw86 commented Oct 4, 2016

@dadgar Another one two failures across 60 allocations: dump2.txt

reserved_ports_issue.zip

Here is the zip. I've included an install.sh and a nomad_sched.json so you can have the exact config that i'm using on the nomad vagrant. Just run the install.sh and then execute the echo'ed 3 commands and should be gtg.

@jshaw86
Copy link
Author

jshaw86 commented Oct 4, 2016

@dadgar make sure you up your memory to 4GB so you can run 60 jobs with the cpu/mem requirements of the job

@jshaw86
Copy link
Author

jshaw86 commented Oct 4, 2016

@dadgar did this do it for you?

@schmichael
Copy link
Member

schmichael commented Oct 4, 2016

@jshaw86 I'm attempting to reproduce this myself, but I noticed all of your port conflicts have been in the default ephemeral port range (32768-60999 on my Ubuntu 16.04 box; see cat /proc/sys/net/ipv4/ip_local_port_range).

Do you mind adding those as reserved ports on your client and seeing if that prevents collisions?

If it does, the fix might be for nomad to automatically add that range to reserved ports.

@dadgar
Copy link
Contributor

dadgar commented Oct 4, 2016

You may need to try on 0.4.0 since there is that other reserve port bug introduced in 0.4.1

@jshaw86
Copy link
Author

jshaw86 commented Oct 4, 2016

@schmichael @dadgar I'm on 14.04 rather than 16.04 if it makes a difference. One of our SRE's Mckenzie is going to reply back to your suggestion about the ephemeral port range as he did try that a while back and ran into some issues.

I think though there is some difference in ideology on how this should work. Currently in nomad the implementation is to blacklist ports rather than whitelist which is causing the issue. I think the way to avoid this is to give nomad a whitelist of ports and then we change the kernels ephemeral port range outside of what we give nomad. Mckenzie can provide more details as he's more experienced in this area.

I think Mckenzie and @diptanu had a chat about this at HashiConf as well along these lines

@dadgar
Copy link
Contributor

dadgar commented Oct 4, 2016

@jshaw86 They both should accomplish the same thing

@mac-abdon
Copy link

@schmichael the problem we are faced with, is that with a nomad agent port black list (reserved_ports), you can only tell nomad what ports not to use. This means that the nomad agent still uses the ephemeral port range to dynamically provision ports. This needs to be a white list.

Any application that is deployed onto the nomad agent, that uses the ephemeral port range, has the potential to consume a port that was meant to be provisioned by the nomad agent. This is where we get the conflict.

We have two buckets of ports, ip_local_reserved_ports (we'll call this local) and ip_local_port_range (we'll call this ephemeral).

The local bucket holds a list of ports that we have determined should not be dynamically assigned. These ports are used in configuration files to start application outside of any nomad process. This is how we prevent any containers from getting these ports.

All of the containers deployed by nomad use the ephemeral bucket. Any application that is started in the container also uses the ephemeral port range to make outbound connections. Again, our conflict. This is where we are seeing port collision.

The nomad agent needs to be able to use a range of ports reserved from the local bucket, for its dynamic port allocation. Then any process in a container that needs to get a port, uses the ephemeral port range.

@mac-abdon
Copy link

mac-abdon commented Oct 4, 2016

@schmichael and @dadgar I think we have some more issues, or at the least, some confusion. From the docs: https://www.nomadproject.io/docs/jobspec/networking.html The dynamic port range is 20000 60000. In testing, setting

net.ipv4.ip_local_port_range = 32768    61000

Nomad does not respect the ephemeral port range, scheduling at random from the hard coded range. Here is a list of ports from our test:
PORT 55289
PORT 23032
PORT 30699
PORT 24497
PORT 39010
PORT 26186
PORT 28465
PORT 29137
PORT 25844
PORT 47243
PORT 30238
PORT 27880
PORT 57696
PORT 41776
PORT 28093
PORT 55455

If you can confirm that nomad does not use the ephemeral range and in fact uses a hard coded (white list) of 20000 60000 (from the kernel docs: "If possible, it is better these numbers have different parity."). All we have to do is use is

net.ipv4.ip_local_reserved_ports = 20000-60000

to solve this problem. At this time, this would leave us about 10k ports for outbound connections. I think we can get by with this. But we might want consider making the hard coded range a bit smaller.

source: https://github.com/hashicorp/nomad/blob/master/nomad/structs/network.go

@dadgar
Copy link
Contributor

dadgar commented Oct 4, 2016

@softslug what you are saying is correct. Additionally if you need more ports you can add the port range 50000-60000 to the port blacklist and now you have 20,000 ports for ephemeral usage (to whatever extreme you want). Nomad can't control what ephemeral port applications launched under it get assigned. That is done by the kernel.

So you need to set the net.ipv4.ip_local_reserved_ports

@dadgar
Copy link
Contributor

dadgar commented Oct 5, 2016

@jshaw86 @mac-abdon How did it go?

@jshaw86
Copy link
Author

jshaw86 commented Oct 5, 2016

@dadgar we're building nomad off this: https://github.com/pubnub/nomad/commits/v0.4.2. Includes @diptanu chroot changes and we changed the port range that nomad can assign so we can better tune the kernel, and get around the existing 0.4.1 reserved port bug on AWS. We will hopefully have this deployed with the kernel changes by EOB today. Will close if everything checks out. Thx!

@dadgar
Copy link
Contributor

dadgar commented Oct 10, 2016

Any update?

@jshaw86
Copy link
Author

jshaw86 commented Oct 10, 2016

Yep we deployed to both our staging servers looks like it's resolved going to wait to close till we're in production which will be later this week.

We didn't realize the nomad servers did the port assignments so it took us a bit longer to deploy in staging then we wanted to practice the roll before we did it in production to make sure we didn't break anything. Which is why it's taken longer. Thx!

@jshaw86
Copy link
Author

jshaw86 commented Oct 12, 2016

We've successfully rolled this into most of our production infrastructure. I'm going to close this. If anyone comes across this in the future all we did is change the nomad port range to be the top end of the ephemeral port range, pubnub@7db9c44. Note that 65536 is non-inclusive.

Then tuned our sysctl ephemeral port range to be outside of that.

net.ipv4.ip_local_port_range = 10001    40000
net.ipv4.ip_local_reserved_ports = 40001-65535

Once this issue is resolved: #1617. You should be able to just specify reserved ports and just change the sysctl ephemeral port range rather than having to do the code change.

@jshaw86 jshaw86 closed this as completed Oct 12, 2016
@dadgar
Copy link
Contributor

dadgar commented Oct 12, 2016

So I think the real outcome of this is that we should make that range configurable and adds some docs on how to avoid this issue.

@jshaw86
Copy link
Author

jshaw86 commented Oct 13, 2016

@dadgar do you want another ticket for those or are you guys tracking internally?

@dadgar
Copy link
Contributor

dadgar commented Oct 13, 2016

Can you open another ticket!

@jippi
Copy link
Contributor

jippi commented Mar 27, 2017

@dadgar was this change, making the min/max port configurable ever fixed? :)

@dadgar
Copy link
Contributor

dadgar commented Mar 27, 2017

@jippi Not yet. Working with a user to test a hypothesis and then will make appropriate changes in Nomad to avoid this problem. It more or less will break down to determining the kernels ephemeral port range and trying to avoid an overlap and also shrinking Nomad's own ephemeral port range.

@shuoli84
Copy link

@dadgar Met the same problem here. Any update on the change?

@github-actions
Copy link

github-actions bot commented Dec 3, 2022

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Dec 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants