-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Track plan rejection history and automatically mark clients as ineligible #13421
Commits on Jun 18, 2022
-
Configuration menu - View commit details
-
Copy full SHA for febcf30 - Browse repository at this point
Copy the full SHA febcf30View commit details -
core: track and act on node plan rejections
Plan rejections occur when the scheduler work and the leader plan applier disagree on the feasibility of a plan. This may happen for valid reasons: since Nomad does parallel scheduling, it is expected that different workers will have a different state when computing placements. As the final plan reaches the leader plan applier, it may no longer be valid due to a concurrent scheduling taking up intended resources. In these situations the plan applier will notify the worker that the plan was rejected and that they should refresh their state before trying again. In some rare and unexpected circumstances it has been observed that workers will repeatedly submit the same plan, even if they are always rejected. While the root cause is still unknown this mitigation has been put in place. The plan applier will now track the history of plan rejections per client and include in the plan result a list of node IDs that should be set as ineligible if the number of rejections in a given time window crosses a certain threshold. The window size and threshold value can be adjusted in the server configuration.
Configuration menu - View commit details
-
Copy full SHA for e343ca6 - Browse repository at this point
Copy the full SHA e343ca6View commit details -
Configuration menu - View commit details
-
Copy full SHA for 5beb6a1 - Browse repository at this point
Copy the full SHA 5beb6a1View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2db5edc - Browse repository at this point
Copy the full SHA 2db5edcView commit details
Commits on Jul 7, 2022
-
Configuration menu - View commit details
-
Copy full SHA for ff15de9 - Browse repository at this point
Copy the full SHA ff15de9View commit details -
allow enabling the node rejection tracker and limit the rate nodes ar…
…e marked ineligible
Configuration menu - View commit details
-
Copy full SHA for 67f42b8 - Browse repository at this point
Copy the full SHA 67f42b8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 98ad0d7 - Browse repository at this point
Copy the full SHA 98ad0d7View commit details -
Configuration menu - View commit details
-
Copy full SHA for 620b413 - Browse repository at this point
Copy the full SHA 620b413View commit details
Commits on Jul 8, 2022
-
Configuration menu - View commit details
-
Copy full SHA for 27e767d - Browse repository at this point
Copy the full SHA 27e767dView commit details -
core: refactor plan rejection tracker
Simplify the interface for `BadNodeTracker` by merging the methods `Add` and `IsBad` since they are always called in tandem and reduce the number and level of log messages generated. Also cleanup expired records to avoid inifinite growth the cache entry never expires. Take explicit timestamp to make tests faster and more reliable.
Configuration menu - View commit details
-
Copy full SHA for ff8f670 - Browse repository at this point
Copy the full SHA ff8f670View commit details -
Configuration menu - View commit details
-
Copy full SHA for bd833f9 - Browse repository at this point
Copy the full SHA bd833f9View commit details
Commits on Jul 12, 2022
-
core: use stable time FSM operation
Set the timestamp for a plan apply operation at request time to avoid non-deterministic operations in the FSM.
Configuration menu - View commit details
-
Copy full SHA for fb2e761 - Browse repository at this point
Copy the full SHA fb2e761View commit details -
config: use pointer for plan_rejection_tracker.enabled
Using a pointer allow us to differentiate between a non-set value and an explicit `false` if we decide to use `true` by default.
Configuration menu - View commit details
-
Copy full SHA for f5936f0 - Browse repository at this point
Copy the full SHA f5936f0View commit details -
Configuration menu - View commit details
-
Copy full SHA for b243d7d - Browse repository at this point
Copy the full SHA b243d7dView commit details