-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rebalance Nomad Scheduled Allocations #10039
Comments
Hi @idrennanvmware! So there's sort of two parts to this issue: why are the allocations getting packed and a feature request for a rebalance. If you're using spread scheduling, I would only expect to see allocations getting packed if the node drains and reboots were happening too close together: on small clusters you'll want to wait for each node should get a chance to come back up and get registered and eligible (although on large clusters there's more headroom to have a few in draining at once). If you take a look at As far as the feature request goes, that seems reasonable and I think we have an open issue for that already that needs some roadmapping: #8368 |
@tgross - here's the alloc from one of the services all packed on the same node
|
Sorry it took me a while to get back this one, @idrennanvmware. I do want to dig into the placement score metrics there in more detail; I don't think we have as good of test coverage as we'd like on how the spread scheduling config interacts with other placement metrics. But I'm also noticing 64 nodes excluded because the computed class is ineligible. Is that expected for your environment? |
@tgross - apologies here, too - been buried in a million things :) To answer your question, yes there could be a large number of excluded nodes given we have a mix of windows and linux and windows consist of the bulk of our "clients" even though they make very limited use of Nomad at this point (but we do rely on it for some critical services like client side load balancers, logging, and telemetry binaries) |
@tgross : Any updates on this please ? |
+1 this would be a really great feature |
+1, seeing this with a small cluster during rolling restarts (and similar scenarios) as mentioned in the OP. Many of our jobs are long-running / static, so the overburdened clients will remain overburdened without intervention. We can |
+1 |
We are also experiencing unexplained skewed distribution of allocation. A function to prevent that and a separate function to rebalance are essential! |
We're currently running into the same scenario. Our cluster setup:
How does it get unbalancedIf one starts from a clean slate, the allocs get allocated as intended and are equally distributed. In our case, the first unbalanced state occurs while updating the nomad/consul cluster. After the update on all machines are done, the topography usually looks like this:
This behavior is expected, but leads to follow-up problems.
If the workers cannot provide 2.5x the normal resource need, allocs are now moved to worker2, which again gets duplicate allocs as a result. To prevent this scenario, we have to scale our servers to handle 2.5 times the normal load instead of just 2x the normal load or we have to wipe the cluster and start from a clean slate (which isn't an option). I hope this gives some insights on why rebalancing would be nice to prevent users from requiring additional (unneeded) resources. |
Nomad 1.0.3
Operating system and Environment details
PhotonOS3
Issue
Over time we have noticed our nodes getting allocations placed on the same node more and more (we use the spread algorithm as a default for the scheduler config). This has produced some interesting scenarios where we have system jobs that are unable to run due to dimensions being exhausted on the node BUT there are other nodes available for allocations on the saturated node.
Right now we manually restart the allocation group which results in the allocations starting to spread as expected but we are a little surprised by this behavior - and in a few instances an application and all it's instances are on the same node.
EDIT: We have a strong suspicion the above is the the culprit. We recently had a sizing operation done on some clusters and ALL of them that were affected by this sizing (and reboot) are displaying this inbalance. We will run tests to see if we can get verifiable behavior. More concerning is that applications seem to "stack up" like the screenshot below.
It may be coincidental but we have also recently migrated all these services to Service Mesh - Not sure that's relevant but I also don't want to exclude any significant changes that coincide with this observed behavior.
In addition we were wondering if there's a way we can issue a rebalance command across the cluster where the scheduler can move allocations? It would be really helpful in some scenarios we have where we might do a rolling restart through a cluster (of the actual nodes themselves) - what happens here is that the node when drains then starts the allocations on the remaining nodes as expected - but at the end of all this we end up with 1 node (the last one to restart) significantly unbalanced. In these scenarios we would really like to be able to trigger a rebalance to ensure the saturation described above does not happen.
Thanks!
Ian
The text was updated successfully, but these errors were encountered: