Speed up task_list when beyond limit #2239
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
In task_list, just return the count for the status when that count is over the max_shown_tasks limit without iterating to check upstream status. This means we can't show counts for upstream statuses, so those are replaced with "unknown" in the visualizer when this happens.
Motivation and Context
When the number of tasks get into the millions, even refreshing the visualizer can take a minute or more, causing havoc in the pipeline. Since all we really want in these situations is the counts, we can skip the more expensive bits of computation and just return the sizes. This prevents doing upstream checks, but saves a lot of time.
We may want to institute a higher threshold so we can get upstream numbers if you're only a little above the limit for returning all tasks.
Have you tested this? If so, how?
Added unit tests, have been running this for about a day and it works at scale to reduce visualizer refreshes from minutes to seconds.