Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU resources #406

Closed
F21 opened this issue Nov 10, 2015 · 10 comments
Closed

GPU resources #406

F21 opened this issue Nov 10, 2015 · 10 comments

Comments

@F21
Copy link

F21 commented Nov 10, 2015

Mesos recently announced some work they have been doing with nvidia to treat GPU resources like CPU and memory resources: https://mesosphere.com/blog/2015/11/10/mesos-nvidia-gpus/

It would be interesting to see if this is something nomad can support too.

@diptanu
Copy link
Contributor

diptanu commented Nov 11, 2015

Hi Francis,

I am interested to hear some use cases and applications that would run on Nomad and use GPUs. 

At the end of the day it's upto applications to exploit heterogenous processor architectures on a host. A cluster manager can help with applications which wants to use GPUs by placing them on a machine which has GPUs and accounting their usages and by being the resource allocator and schedule different jobs which wants GPU resources available in the cluster.

We could potentially fingerprint a machine for GPU resources and expose them to users by the GoLang bindings of CUDA. But before we do anything with GPUs it would be nice to understand some use cases and applications which people are already running in production.


Diptanu

On Tue, Nov 10, 2015 at 2:39 PM, Francis Chuang notifications@github.com
wrote:

Mesos recently announced some work they have been doing with nvidia to treat GPU resources like CPU and memory resources: https://mesosphere.com/blog/2015/11/10/mesos-nvidia-gpus/

It would be interesting to see if this is something nomad can support too.

Reply to this email directly or view it on GitHub:
#406

@F21
Copy link
Author

F21 commented Nov 11, 2015

Some possible use-cases:

@cbednarski
Copy link
Contributor

@F21 I think we can support this via fingerprinting, and in AWS this can probably already be detected via instance type.

I'm not sure what kernel support exists for this (I suspect none without custom drivers) but unless there is some way for Nomad to monitor how many GPU resources a process is using I don't think it makes much sense in Nomad core. Deeper integration, e.g. with the CUDA API or similar, will require a custom builds with a C toolchain.

I think this is a good candidate for a feature that can be implemented as a plugin once we have a plugin architecture in place. In that case we'd want support for Nomad to schedule based on plugin-defined resources.

@diptanu
Copy link
Contributor

diptanu commented Nov 11, 2015

@cbednarski I think once we have a plugin architecture in place, we should be able to have fingerprints outside our main binary to enable such use cases.

My main question was around how do we expose the GPU capacity available on a node and if there are any ways to bound resources across multiple Tasks using the available GPU on a node.

@cbednarski
Copy link
Contributor

My main question was around how do we expose the GPU capacity available on a node and if there are any ways to bound resources across multiple Tasks using the available GPU on a node.

@diptanu Yeah that's the part where I think this requires integration with the CUDA API, or use of a customized kernel. There's a lot of research in this area but I'm not aware of any mainline features to support this.

Assuming there is a way, though, I think it should be possible to support some kind of arbitrary resources in the scheduler. Likely we will want this to track other constrained or provisioned resources like block storage, ENI, SDN, etc.

@sheerun
Copy link
Contributor

sheerun commented Feb 5, 2016

Yes, custom resources would be hugely useful

@sheerun
Copy link
Contributor

sheerun commented Apr 12, 2016

There is no need to implement exposing of gpu. Let's just allow to define custom resources and allow custom drivers to handle them. It is connected to #1061

@sisp
Copy link

sisp commented Sep 25, 2017

+1 for GPU resources

@endocrimes
Copy link
Contributor

As of Nomad 0.9 which is currently available as a beta, we have support for scheduling jobs that require access to arbitrary devices, and a default plugin which provides access to nvidia GPU's. You can find the documentation for the nvidia device plugin here: https://www.nomadproject.io/docs/devices/nvidia.html

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants