question - resource allocation in docker #2082

OferE · 2016-12-12T10:34:02Z

Hi,
I see that there is a strict memory allocation policy in nomad.
Is there a way to workaround it? - I would like to have containers running from nomad to run as regular docker containers without limitations... (just like in docker swarm for example)...

Also, using swap may act as a safety nets (expecally in containers running things like spark that has complicated memory allocation).
Is there a way to allow swap usage?
Enforcing resource allocation is a great feature but it would be nice if it can be canceled for certain types of containers....

jippi · 2016-12-12T10:41:03Z

afaik oversubscription is planned for a later release - since nomad is doing binpacking oversubscription is incredible hard to mix in with that :)

Today there is no workaround for this behavior, and afaik it's a release or two (at least) in the future at this time.

OferE · 2016-12-12T11:22:06Z

So, is this means that i cannot use swap in my infrastructure if i choose to work with docker and nomad?

jippi · 2016-12-12T11:54:51Z

for now, yes i think so. I'll leave any additional comments and clarification to someone from hashicorp :)

OferE · 2016-12-12T15:11:13Z

A.
If I understand coorectly oversubscription will not solve this issus - it is more like AWS spot instances than a real solution like docker swarm.

B.
I see that Mesos has the same behavior - looks like everyone copied from Google's Borg...
This design is great where u have a cluster of physical machines, but has some disadvantages when working in the cloud.
The scheduling should be much simpler in case we are in the cloud and have dedicated machines to each service. Let the user choose which machine type to use for which service, and let the containers just run there under the docker and OS level - there is no need for nomad to allocate resources as the user already decided on this. QOS is a nice feature only in data centers and not in the cloud - there it is just limitation.

for example:
I have spark cluster, kafka cluster etc.
For each cluster i chose upfront different instance types - and i just run one container in each machine. why should i indicate resources in nomad? why can't I use swap for spark - where it makes my app stable in peak moments? it doesn't make any sense...

Docker swarm for example, is much simpler and solves this out of the box - there is no allocation for hardware, just constraints and affinities.
This approach is better for most cloud usages.

OferE · 2016-12-12T15:17:57Z

I think that by adding a flag to remove all cgroups limitations - it will solve all the problems.
This flag should generate a warning regarding QOS and that's it - in this mode the user is in charge on the QOS and not nomad.

dadgar · 2016-12-12T17:36:09Z

Hey @OferE,

Your use case of Nomad is slightly different if you are doing static partitioning of nodes to types of jobs and is not the design goal of Nomad. Nomad is designed to be be run in a resource agnostic way as much as possible. Jobs should declare what they need and the decision of where to place it should be done by the scheduler. In order to guarantee that there is both enough resources on the chosen machine and that the placed jobs get the runtime performance they need we both do resource isolation and disable swap.

If the machine is swapping, performance loss is significant and in a system designed to be multi-tenant and with bin-packed machines, this is unacceptable.

If you would like to have finer grain control, we provide the raw_exec driver which allows you to make these decisions. In the future there will also be pluggable drivers so you could build your own which is less restrictive. However, for the built in drivers we won't be making that concession.

Thanks,
Alex

OferE · 2016-12-12T18:01:24Z

thanks for the info - i'll try raw_exec instead of docker driver.

OferE · 2016-12-12T18:23:06Z

I urge u to rethink about the use case of "static partitioning". Dynamically allocating resources for the cluster is not suitable for all use cases. Kafka and Spark are great examples. It just won't work there.
You need dedicated machines for them - and there is no point in limiting their processes. You want to get the entire efficiency from the cloud machine without worrying that u specified a resource incorrectly.

dadgar · 2016-12-12T18:35:47Z

@OferE I am not sure why you think those require dedicated machine? They require dedicated resources. In those cases I would specify a large resource requirements such that they are guaranteed enough resources and as such in most cases they won't be multi-tenant.

I agree that there are use cases that require whole machines (databases for example). To support that case we will add a future resource option to reserve the whole node. But for most applications this is not the case.

OferE · 2016-12-12T19:01:25Z

Spark has some logic inside it and some defaults, for example to use all the cores of the machine.
Working in a cgroup environment will confuse it.

I understand your vision and I also think that someday we will get to a point that many major products will understand containerized environments and align their inner logic accordingly.

Playing and adjusting spark or worse pyspark memory allocation is not a trivial thing. Making it work for cgroups is too much at this time.

Also - reserving the entire instance just for one container is also not good for all cases - there is always other container/process that needs to run in this instances: agents for log collection and monitoring for example - in fact -it would be very nice to limit these agents and let the main container/process get wild :-)

One more thing I like to recommend to you: there is also a point of working with dev vs production.
Dev environments are significantly weaker than the production ones since you want to save money.
Writing 2 versions of nomad files (to allow 2 different resource isolation) for each of them is too limiting.
For development it will also be nice to not specify resources - since developers play with the instance type all the time.

Nomad is a great project - i like it much more than mesos/swarm/Kubernetes

OferE · 2016-12-12T19:20:07Z

If I had found a magic fish that would give me a wish - i'd ask to have the following type of resource isolation:

Minimal resource allocation - make sure my container will run in a strong env
Maximal resoure allocation - limit infra containers (monitoring/log collection).
Minimal + maximal - replicate my logic according to your design
None - dev + static partitioning...

I would have a declartive dev and production resource isolation.

This is how a perfect world looks like :-)

OferE · 2016-12-13T14:16:11Z

raw_exec driver is not working correctly: stopping the job doesn't kill the container - thus it is not a good workaround.

jippi · 2016-12-13T14:18:55Z

can you please share your job file and how the script is executed if you use any shell to wrap it? hard to help debug without any information :)

raw_exec does work just fine, so its probably that you need to TRAP a signal to make sure docker stops the container :)

OferE · 2016-12-13T14:24:36Z

Thanks - i just realized that. i didn't trap it lol.

OferE · 2016-12-13T14:25:00Z

BTW - which signal should i trap?

jippi · 2016-12-13T14:31:50Z

Per https://www.nomadproject.io/docs/operating-a-job/update-strategies/handling-signals.html it should be SIGINT :)

OferE · 2016-12-13T14:39:05Z

thank u so much for your help on this!

jippi · 2016-12-13T14:44:07Z

I just did a test and I got SIGTERM though, better test for yourself :)

OferE · 2016-12-14T09:40:10Z

It seems like it doesn't work. i trapped the SIGTERM SIGINT and my script never got it.
When i send the signal myself the script is stopped.

this is my script:
the trap never gets any signal from nomad :-(

#!/bin/bash
# handler for the signals sent from nomad to stop the container
my_exit() 
{
   echo "killing $CID"
   docker stop --time=5 $CID # try to stop it gracefully
   docker rm -f $CID # remove the stopped container 
}

trap 'my_exit; exit' SIGHUP SIGTERM SIGINT

# Building docker run command
CMD="docker run -d"
for a in "$@"; do
   CMD="$CMD $a"
done

echo docker wrapper: the docker command that will run is: $CMD
echo from here on it is the docker output:
echo 
# actually running the command
CID=`$CMD`

# docker logs is printed in the background
docker logs -f $CID &

# allows the process to listen to signals every 3 seconds
while : 
   do 
      sleep 3 
   done

OferE · 2016-12-14T09:54:56Z

I found the problem :-)
The problem is in my sleep and the gracefull period.
I have sleep 3 - and kill_timeout default is 5 - this cases my script to be terminated before it gets the signal.
Changing my script sleep to 1 solved the issue.

I think i will stay with sleep 3 and set kill_timeout to explictly 45 seconds.

drscre · 2016-12-16T00:05:36Z

If you don't mind building Nomad from sources there is a trivial patch for Nomad 0.5.1

It adds "memory_mb" docker driver option which, if set to non-zero, overrides memory limit specified in task resources.

https://gist.github.com/drscre/4b40668bb96081763f079085617e6056

You can allow swap in a similar way

OferE · 2016-12-16T00:14:53Z

interesting, i'll wait for official 0.5.1 and test it.thanks for the info.

github-actions · 2022-12-17T02:12:03Z

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

dadgar closed this as completed Dec 12, 2016

jippi mentioned this issue Dec 13, 2016

[question] Launch docker container with hard memory limit higher than specified in task resources #2093

Closed

shoenig mentioned this issue Mar 2, 2020

Is there a way to disable memory limits for docker containers. #7235

Closed

github-actions bot locked as resolved and limited conversation to collaborators Dec 17, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question - resource allocation in docker #2082

question - resource allocation in docker #2082

OferE commented Dec 12, 2016

jippi commented Dec 12, 2016

OferE commented Dec 12, 2016

jippi commented Dec 12, 2016

OferE commented Dec 12, 2016

OferE commented Dec 12, 2016

dadgar commented Dec 12, 2016

OferE commented Dec 12, 2016 •

edited

Loading

OferE commented Dec 12, 2016 •

edited

Loading

dadgar commented Dec 12, 2016

OferE commented Dec 12, 2016

OferE commented Dec 12, 2016

OferE commented Dec 13, 2016

jippi commented Dec 13, 2016

OferE commented Dec 13, 2016

OferE commented Dec 13, 2016

jippi commented Dec 13, 2016

OferE commented Dec 13, 2016

jippi commented Dec 13, 2016

OferE commented Dec 14, 2016

OferE commented Dec 14, 2016 •

edited

Loading

drscre commented Dec 16, 2016

OferE commented Dec 16, 2016 via email

github-actions bot commented Dec 17, 2022

question - resource allocation in docker #2082

question - resource allocation in docker #2082

Comments

OferE commented Dec 12, 2016

jippi commented Dec 12, 2016

OferE commented Dec 12, 2016

jippi commented Dec 12, 2016

OferE commented Dec 12, 2016

OferE commented Dec 12, 2016

dadgar commented Dec 12, 2016

OferE commented Dec 12, 2016 • edited Loading

OferE commented Dec 12, 2016 • edited Loading

dadgar commented Dec 12, 2016

OferE commented Dec 12, 2016

OferE commented Dec 12, 2016

OferE commented Dec 13, 2016

jippi commented Dec 13, 2016

OferE commented Dec 13, 2016

OferE commented Dec 13, 2016

jippi commented Dec 13, 2016

OferE commented Dec 13, 2016

jippi commented Dec 13, 2016

OferE commented Dec 14, 2016

OferE commented Dec 14, 2016 • edited Loading

drscre commented Dec 16, 2016

OferE commented Dec 16, 2016 via email

github-actions bot commented Dec 17, 2022

OferE commented Dec 12, 2016 •

edited

Loading

OferE commented Dec 12, 2016 •

edited

Loading

OferE commented Dec 14, 2016 •

edited

Loading