-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slightly Smarter Task Shuffling #2057
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few smaller comments. Let's get some unit tests in there to prove that the prioritization is working as intended then we can test in staging 👍
System.currentTimeMillis(), | ||
task.getTaskId(), | ||
message, | ||
Optional.of(UUID.randomUUID().toString()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nit, it would be good for the actionId here and in the SingularityPendingRequest to match. For this particular case it doesn't have any lasting effect on the actions themselves, but makes it easier to trace through the data for debugging if they are equal
double memoryUtilization = task.getUsage().getMemoryTotalBytes() / task.getRequestedResources().getMemoryMb(); | ||
double cpuUtilization = task.getUsage().getCpusUsed() / task.getRequestedResources().getCpus(); | ||
|
||
return (memoryUtilization + cpuUtilization) / 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still thinking more on the metric for this. You were correct to point out the fact that, by shuffling more/smaller tasks we may very well hit the shuffling limits more often. I'm not certain if it will be often enough to be of consequence given the size of our cluster, but there are 2 things I could see us doing to compensate for it:
- Possibly make the shuffle limits a percentage of the overall task count (i.e. so we don't have to worry about it being more restrictive as the cluster grows larger)
- Possibly make a slightly more complex scoring mechanism. I could see a case where there is maybe a threshold for cpu (something simple like a 50-60% cutoff) in memory shuffle. If tasks are below that, they are given a higher priority for high memory usage (inverse of now), if they are above that they are scored as they are now. This could serve to put a few higher memory, but more likely less busy tasks at the top of the queue, resulting in fewer overall shuffles
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unit tests are on the right track. Left some comments about shortcuts for some of these things we have already written so they don't have to be duplicated here
} | ||
} | ||
|
||
protected void scheduleTask(String rqid, double requiredCpus, double requiredMemoryMb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a few shortcuts available already that should make the stuff in this method easier/shorter, though I'm impressed at all of the pieces you uncovered here :)
saveRequest
will take care of the requestManager.activate step you have belowinitAndFinishDeployWithResources
will do most of the first section herelaunchTask
can take the request/deploy, instance, and a state you want to get it to, as well as possible hostname args, and could be a possible shortcut for getting these tasks to an active state. A normal flow for a task is similar to what you had, where a pending request is created, drainPendingQueue turns that to a pending task, which then is launched with resource offers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really should've checked for review comments before I kept going, will change to use these methods.
The initAndFinishDeployWithResources
method seems to be bound to a single class scope requestid, so I don't think I'll be able to use it directly to support multiple requests per test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since launchTask
bypasses the normal task flow, it breaks/complicates testing of relocation of bounced tasks. I'm sure there's a way around this, that also makes these tests less horribly messy, but it might not be worth the trouble.
scheduler.drainPendingQueue(); | ||
} | ||
|
||
protected Map<String, Map<String, SingularityTaskId>> getTaskIdMapByHostByRequest() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know that we really need to build a map of this here. It seems in each case you are just using the getActiveTaskIds to then loop through all of them and get them into an active/starting state which can then be viewed by the usage poller. Can probably just do a for loop on taskManager.getActiveTaskIds() after maybe validating the size of the list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel the map is more intuitive than looping through the task ids and matching host/request names, especially when working with multiple hosts, but it's definitely not something that needs to be built. I'd prefer to leave it, but will remove it if you prefer a simpler structure.
🚢 |
The current task shuffling/eviction logic used by Singularity prioritizes shuffling tasks with the highest resource utilization (memory or CPU, depending on which is most overused), on each overcommitted slave. Though this is a good way to ensure the shuffle succeeds at reducing load to acceptable levels, it is also much more likely to shuffle tasks performing large amounts of work or holding large amounts of in-memory state. This makes sense for high CPU tasks, but is undesirable for high memory ones - they'll likely return to their previous memory consumption and possibly trigger another shuffle on their new host.
To address these issues, this PR attempts to:
SingularityUsagePoller
to a newSingularityTaskShuffler
class, while preserving existing functionality/signatures.Because of these changes, it is very likely that tasks that were not shuffled before will now be shuffled, which in turn has a fair chance of causing issues with these tasks in QA/prod. I don't think there's a good way to avoid this and also actually change the shuffling logic, though self-service shuffle opt-out may help mitigate things.
TODO: