Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
So the current issue is that repos are allowed to be recollected if they are more than a day old. This is causing issues in datasets where one user has the majority of the repos and there are many other users that have small amounts of repos. For example in a dataset where one user has 20000 repos and 19 other users all together have 2000 repos. In this case the single user with a lot of repos only has a 25% chance of being selected by the scheduling algorithm (in fact all users have a 25% chance). The issue is that the other 19 users have enough repos that there are always some repos that are more than 1 day old and therefore can be recollected. This results in the 20000 repos from the single user only being considered 25% of the time even if some of their repos are 3 months old, and the repos for the other 19 users are 1 day old. What is difficult about this issue, is that this is the expected behavior. Due to the fact that we don't want users that add a lot of repos and steal all the bandwidth. So to solve this I changed the requirement for recollection to 7 days for core, 10 days for secondary, 7 days for facade, and 10 days for ml. This means the repos for the 19 users will likely be processed through in a day or so, and then for the rest of the 6 days the older repos from the user with 20000 repos will be selected for collection every time since they are the only one left with valid repos to collect.
This PR fixes #
Signed commits