Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max open file limit issue with Elastic Search and Docker #418

Closed
hopper-signifyd opened this issue Feb 17, 2020 · 6 comments
Closed

Max open file limit issue with Elastic Search and Docker #418

hopper-signifyd opened this issue Feb 17, 2020 · 6 comments
Labels
enhancement New feature or request good first issue Good for newcomers need-code Implementations welcome!

Comments

@hopper-signifyd
Copy link

What happened:
I'm running into ulimit / max open file limit issues when trying to run Elastic Search on Docker using this AMI. This is the same as issue #278. Issue #278 is marked as "closed", but as the last comment on the issue states, this is still a problem.

How to reproduce it (as minimally and precisely as possible):
I launched the newest version of the image that I could find: amazon-eks-gpu-node-1.14-v20191213 (ami-0730212bffaa1732a) and found the contents of /etc/sysconfig/docker to be this:

# The max number of open files for the daemon itself, and all
# running containers.  The default value of 1048576 mirrors the value
# used by the systemd service unit.
DAEMON_MAXFILES=1048576

# Additional startup options for the Docker daemon, for example:
# OPTIONS="--ip-forward=true --iptables=true"
# By default we limit the number of open files per container
OPTIONS="--bridge=none --default-ulimit nofile=2048:8192 --log-driver=json-file --log-opt max-size=10m --log-opt max-file=10 --live-restore=true --max-concurrent-downloads=10"

# How many seconds the sysvinit script waits for the pidfile to appear
# when starting the daemon.
DAEMON_PIDFILE_TIMEOUT=10

For my purposes, I've written a command into my CloudFormation UserData to replace nofile=2048:8192 with nofile=65536:65536. It looks something like this:

mv /etc/sysconfig/docker /etc/sysconfig/docker.bak
sed 's/nofile=2048:8192/nofile=65536:65536/g' /etc/sysconfig/docker.bak > /etc/sysconfig/docker
systemctl restart docker

Environment:

  • AWS Region: US East 1
  • Instance Type(s): any
  • AMI Version: amazon-eks-gpu-node-1.14-v20191213 (ami-0730212bffaa1732a)
@mogren mogren added good first issue Good for newcomers help wanted labels Mar 27, 2020
@mogren
Copy link

mogren commented Mar 27, 2020

@hopper-signifyd Thanks for the detailed issue, and sorry for the late answer. It has been some busy times lately.

The issue here seems to be between the default docker settings and Elastic Search wanting to have a lot higher limit. The defaults are like that for a reason, but I agree that we could make it easier to enable higher limits for running Elastic Search. I wonder if improved documentation around the --docker-config-json flag in bootstrap.sh would be enough, or if adding a section about customizing for Elastic Search?

Another option would be to add a custom flag in bootstrap.sh, something like --configure-docker-nofile-limit <limit>, similar to the --enable-docker-bridge flag, that would let you set that limit explicitly.

@mogren mogren added the enhancement New feature or request label Mar 27, 2020
@hopper-signifyd
Copy link
Author

Hi @mogren,
Thanks for looking into this. If this can be fixed via --docker-config-json, then I think that would be sufficient. An example in the docs would be helpful. I think adding things like --configure-docker-nofile-limit if the parameter could be configured via a more general approach would lead to people asking why there's a specific option for one config parameter, but not for others.

@ajcann
Copy link

ajcann commented Jul 7, 2020

@mogren I believe this may be a regression as I understand the intention to be a ulimit of 65535 not 8192. Please see #278 and
#233 for context where it was determined that changing the ulimit to 8192 was accidental and was fixed in subsequent releases.

@Jeffwan
Copy link
Contributor

Jeffwan commented Jul 9, 2020

@ajcann
I think there's two separate issues.

The first, I can confirm --default-ulimit nofile=2048:8192 is still in accelerator AMI, I think I can either make a change to 65535 or remove this option, this should be shipped with CUDA 10.2 and latest nvidia driver pretty soon.

The second, that's the only ulimit setting we give in the docker configs. I think @mogren 's point is more like how can we better support these cases. We have some other products needs memlock etc.

@ajcann
Copy link

ajcann commented Jul 9, 2020

@ajcann
I think there's two separate issues.

The first, I can confirm --default-ulimit nofile=2048:8192 is still in accelerator AMI, I will make a change to 65535, this should be shipped with CUDA 10.2 and latest nvidia driver pretty soon.

The second, that's the only ulimit setting we give in the docker configs. I think @mogren 's point is more like how can we better support these cases. We have some other products needs memlock etc.

@Jeffwan ah yes I see that you are correct. Thank you for addressing the docker options!

@cartermckinnon cartermckinnon added need-code Implementations welcome! and removed help wanted labels Jan 24, 2022
@cartermckinnon
Copy link
Member

My understanding is this has been addressed, please let us know if this remains an issue on the latest AMI's.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers need-code Implementations welcome!
Projects
None yet
Development

No branches or pull requests

5 participants