Fix marking GitHub-sourced experiment crates as complete #756
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When processing GitHub repositories, our list usually (always?) contains a repository without any form of commit hash. Crater agents checkout that repository and, as part of building it, record the commit hash they used. The agent then submits that hash to the db for storage.
When storing the hash, we replace the "crate name" (owner.reponame) with a more specific ID (e.g., owner.reponame.$cratehash). This means that the set of crates we tested has effectively changed at this point from our perspective. Next we store results (previously under the original name, now under the new name) and also update the (previous old, now new) experiment_crates record to mark it complete.
The net effect is that prior to this commit (and likely since ~Aug 31) every GitHub repository has been repeatedly tested by Crater until we eventually hit count(results) >= count(experiment_crates). This is basically just a random point in time though, AFAICT there's no relationship between the set of crates we wanted to test and the set of results we have. One saving factor is there's some amount of fixed point -- if the GitHub repository we test doesn't receive any new commits between attempts to run it, we'll re-test the same code and the old/new IDs will match, letting us mark it as complete. But this is at best a minor improvement, it's not actually a mitigating factor.
As a future TODO, we probably should update the "finish condition" from counting results and experiment_crates and instead use something like "are there any experiment_crates with a status of queued" which makes much more sense.