-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: add job-specification docs for numa #18864
Conversation
cc @schmichael who had opinions on the wording around fragmentation / suggesting preemption (which I left out) |
Co-authored-by: Tim Gross <tgross@hashicorp.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Left some comments, but not blockers... not even sure they're good ideas. 😅
This is just complex stuff! Not sure if we should keep the docs minimal because presumably folks who go looking for NUMA features already have a strong sense of how this stuff works, or if we should steal more from the RFC that helped educate NUMA-newbies like me.
- `none` - Nomad is free to allocate CPU cores using any strategy. Nomad uses | ||
this freedom to allocate cores in such a way that minimizes the amount of | ||
fragmentation of core availability per NUMA node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to allocate cores in such a way that minimizes the amount of fragmentation of core availability per NUMA node.
Can we steal more from the RFC's wording? It's not as succinct, but this took me a minute to parse.
The images make it really immediately obvious. I wonder if we should just copy/paste the RFC in 🤷
The `require` affinity option should be used sparingly due to | ||
the implied fragmentation caused by reserving CPU cores based on the NUMA node | ||
they are associated with. Use it for workloads known to be highly sensitive | ||
to memory latencies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...used sparingly... ...implied fragmentation...
I know people often want more prescriptive guidance from us, but I would rather frame this in terms of the tradeoff being presented than to discourage the use of a feature that could reduce overall resource consumption dramatically. (Assuming if you're avoiding 300% performance penalties due to cross-node latencies, you can run fewer instances of a service to serve the same number of requests.)
So perhaps something like:
The
require
affinity option may cause workload fragmentation by reserving CPU cores based on the NUMA node they are associated with. Use it for workloads known to be highly sensitive to memory latencies.
Might even be worth defining workload fragmentation
somewhere as something like:
A jobspec constraint that prevents optimal binpacking of Clients. This can waste cluster resources by leaving some Client resources free but unusable. For example when
numa.affinity = "require"
, workloads cannot be scheduled on Clients which may have ample free compute resources unless those compute resources happen to be colocated on a single NUMA node.
idk where an appropriate place for that would be though.
I'm working on a Concepts/CPU doc that will cover "everything" to do with how Nomad interacts with your processor. I'm thinking we should keep the jobspec doc fairly minimal and then link to the concepts doc for further reference (once it exists). |
Yeah I love this approach and what I plan on doing for workload identity. Leave the reference docs concise so folks can just find what they're looking for quickly. Concept docs provide context. Tutorials provide walkthroughs/howtos. 👍 |
* docs: add job-specification docs for numa * docs: take suggestions Co-authored-by: Tim Gross <tgross@hashicorp.com> * docs: more cr suggestions --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>
* docs: add job-specification docs for numa * docs: take suggestions Co-authored-by: Tim Gross <tgross@hashicorp.com> * docs: more cr suggestions --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>
* docs: add job-specification docs for numa * docs: take suggestions Co-authored-by: Tim Gross <tgross@hashicorp.com> * docs: more cr suggestions --------- Co-authored-by: Tim Gross <tgross@hashicorp.com>
Separate from a more comprehensive "CPU Resources" concepts docs we spoke of