[feature] Defining and consuming custom resources #1081

sheerun · 2016-04-12T18:58:31Z

This is one possible solution for #406. The idea is to implement custom resources similarly to mesos:

https://mesos.apache.org/documentation/attributes-resources/

When starting a client, specify resources it exposes

client {
  enabled = true

  resource "gpu" {
    type = "range"
    begin = 1
    end = 4
  }

  resource "ip" {
    type = "ip-range"
    begin = "192.168.1.0"
    end = "192.168.1.255"
  }

  resource "github_token" {
    type = "set"
    items = ["1234", "abcd"]
  }

  resource "network" {
    type = "enum"
    items {
      private = "eth2"
      overlay = "overlay1"
    }
  }
}

Allow to specify used resources in task specification

task {
  resources {
    gpu = 2
    ip = 1
    github_token = 1
    network = "private"
  }
}

Expose used resource to tasks (and custom drivers) via env variables

NOMAD_RESOURCE_gpu = '1,2'
NOMAD_RESOURCE_ip = '192.168.1.5'
NOMAD_RESOURCE_github_token = '1234'
NOMAD_RESOURCE_network = 'eth2'

The text was updated successfully, but these errors were encountered:

tolmanam · 2018-05-07T12:46:48Z

This seems like a pretty useful feature. Is it on the roadmap or did something better come along?

jbott · 2018-07-03T22:58:42Z

Are there any plans to implement this or something similar? I have a use case where I am connecting hardware resources to individual nodes, and I would need to expose that info at the scheduler level to correctly place tasks.

schmichael · 2018-07-09T23:07:08Z

Device Plugins are planned but still in an internal design phase, so I don't want to make any promises regarding the timeline. Posting detailed use cases and implementation ideas here is definitely welcome and will be taken into account!

andrewchambers · 2018-10-30T01:22:11Z

I have a use case for jobs that involves mounting cloud provisioned disks, however each google compute node only supports 4 disks, so there is a resource limit of 4 per node.

tgross · 2021-01-19T16:32:22Z

Doing a bit of issue cleanup here. Since we last checked in on this issue, Nomad has shipped the devices, task driver, and storage plugin interfaces. I'm going to close this as resolved, and if we have interest for new kinds of plugins we can discuss that in a new issue.

schmichael · 2021-01-19T18:07:12Z

Let's keep this open as generic resources have yet to be implemented, but they're still something we're considering. While we have pretty good plugin coverage these days as @tgross noted, we lack the declarative custom resource definition approach outlined in the original issue. In the past people would even hijack the mostly-meaningless mbits resource to use as a custom resource, but now we've deprecated that as well!

If devices supported multi-tenancy they would be a solution to custom resources, albeit a lot more effort than the declarative format proposed above. However as of 1.0, device attributes are only used for constraints and a single device can only be used by a single allocation. This means you can't create custom devices just to have custom resources.

andrewchambers · 2021-01-19T20:07:41Z

Definitely not resolved, I feel like nomad has implemented everything BUT a simple way to define custom resources. It kind of feels like some of these other features are needlessly specific and complex compared to simple custom resources.

mgdunn2 · 2022-03-03T20:40:01Z

We would love the ability to constrain allocations by available vram using the nvidia/cuda plugin with multi-tenancy but the proposal above would allow us to schedule around a generic resource that emulates that behavior.

lattwood · 2022-10-21T19:22:46Z

@schmichael what's the possibility of committing to not removing mbits from Nomad until there's generic resource support?

benbuzbee · 2022-12-01T00:01:23Z

We have 25 Gbps NICs and jobs that want 10 Gbps of it - we need nomad to know it can't schedule more than 3 of those jobs on a single machine.

edigaryev · 2022-12-02T22:36:41Z

However as of 1.0, device attributes are only used for constraints and a single device can only be used by a single allocation. This means you can't create custom devices just to have custom resources.

Assuming that:

a single allocation consumes the device for the allocation's runtime duration
a custom device plugin can programatically create/fake an arbitrary number of virtual non-existent devices

Isn't that the same as having custom integer-based resources? In a hacky way, of course.

edigaryev · 2022-12-02T22:43:56Z

Isn't that the same as having custom integer-based resources? In a hacky way, of course.

After quick searching through GitHub, found the nomad-generic-device-plugin.

It seems to achieve exactly that, without an automatic way to generate N devices in a specific (vendor, type, model) tuple, though.

diptanu added type/enhancement theme/scheduling labels Apr 14, 2016

tgross closed this as completed Jan 19, 2021

schmichael reopened this Jan 19, 2021

schmichael mentioned this issue May 7, 2021

Remote Task Drivers and Resources #10549

Open

DerekStrickland self-assigned this Apr 12, 2022

tgross unassigned DerekStrickland Feb 13, 2023

Kamilcuk mentioned this issue Apr 13, 2023

update reserved resources on client dynamically #16864

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature] Defining and consuming custom resources #1081

[feature] Defining and consuming custom resources #1081

sheerun commented Apr 12, 2016 •

edited

Loading

tolmanam commented May 7, 2018

jbott commented Jul 3, 2018

schmichael commented Jul 9, 2018

andrewchambers commented Oct 30, 2018

tgross commented Jan 19, 2021

schmichael commented Jan 19, 2021

andrewchambers commented Jan 19, 2021

mgdunn2 commented Mar 3, 2022

lattwood commented Oct 21, 2022

benbuzbee commented Dec 1, 2022

edigaryev commented Dec 2, 2022

edigaryev commented Dec 2, 2022

[feature] Defining and consuming custom resources #1081

[feature] Defining and consuming custom resources #1081

Comments

sheerun commented Apr 12, 2016 • edited Loading

When starting a client, specify resources it exposes

Allow to specify used resources in task specification

Expose used resource to tasks (and custom drivers) via env variables

tolmanam commented May 7, 2018

jbott commented Jul 3, 2018

schmichael commented Jul 9, 2018

andrewchambers commented Oct 30, 2018

tgross commented Jan 19, 2021

schmichael commented Jan 19, 2021

andrewchambers commented Jan 19, 2021

mgdunn2 commented Mar 3, 2022

lattwood commented Oct 21, 2022

benbuzbee commented Dec 1, 2022

edigaryev commented Dec 2, 2022

edigaryev commented Dec 2, 2022

sheerun commented Apr 12, 2016 •

edited

Loading