-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sometimes approves the wrong commit #174
Comments
Related Zulip: So homu is the result of GitHub REST API rate limitation if I understood it correctly where as bors-ng is stateless and homu stateful relying entirely on the ingress webhooks for state synchronisation - which seems to have failed few or more times. I am unsure if there is some type of retry mechanism in the ingress GH webhook calls and whether homu honors retries or whether this is something that could be configured on GH side or make Homu support - which could provide easy low hanging fruit fix to increase stability - webhooks are supposed to get status back until they are delivered 1 - Figure out the impact firstI am going to see if I can do a GraphQL query/ies to scrape off all the PR r+ comments/commits and correlate to figure out how many times - and exactly where - homu has picked up the wrong commit vs what the latest commit was given PR That analysis would / will allow us to see which and how many PRs are affected and then see how much of an issue this is. 2 - Potentially fix somethingThen to potentially fix this in Homu it would mean either - depending on reliability / real impact factors as above
|
Re: GitHub delivery tracking API Seems GitHub has introduced API exactly for what we might need re 2.2) .. https://github.blog/changelog/2021-06-30-webhook-deliveries-api/ https://docs.github.com/en/rest/reference/repos#webhooks Can someone go see what the situation looks for 30 last days? Also it may as well be that simple "healthcheck" polling monitor that can trigger the retries via this new API in homu might greatly improve webhook deliverability if required ? And if the "healthcheck" monitor simply reports to Homu that there are undelivered webhooks it would hang on before acting on things.. |
I think rather than focusing on approval time, we should probably have a check at (bors) merge commit generation time that the approved source commit is equivalent to the PR head commit at that time. That also helps us avoid issues with further pushes being missed. I don't think focusing on missed webhooks is going to work well; GitHub has just not delivered things in the past for some time, we shouldn't depend on that for reliability here. |
That would require querying GH one per commit generation for those but I guess there isn't too many of those that would break the rate limit as was the reason for Homu to exist |
We would post an error message in that case. There's no real issue with rate limits -- we're only generating new merge commits probably roughly <15 times/day, so checking at that time is very cheap. I think there's more work that can be done to improve homu's synchronization with real state, but that's a separate issue from this one and doesn't need to be coupled. |
I'll also try to finish the impact analysis during the weekend off curiosity - just needs some regular expression on bors messages and comparing them to commit history I just got some quick GraphQL running on GH API for doing analysis what has been going on:
After that I can push a PR for Homu to do that merge commit commit sync check |
Unfortunately, no. There are three borses, not two.
|
1520: fix(batcher): verify the PR commit before merging r=notriddle a=notriddle Fixes #1519 CC rust-lang/homu#174 (comment) Co-authored-by: Michael Howell <michael@notriddle.com>
Raised #178 for suspected general WebHook delivery instability which addressing commit mixups alone would not fix (this ticket) |
For reference, here is another case where this (or something similar) happened rust-lang/rust#119748 (comment) |
There have been two circumstances where a normal
@bors r+
approved an old commit:My hunch is that somehow bors missed the push notification, and the
state.head_sha
doesn't get updated to the latest version.I think Homu should never approve a commit that is not the latest commit. I imagine it should check what the latest commit is when approving instead of assuming that the database is in sync. There's definitely risks about race conditions here that may not be solvable, but some extra effort might make it more resilient.
What's scary is that this may be happening and nobody notices. These two instances only got noticed because the old commit failed in CI.
The text was updated successfully, but these errors were encountered: