Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update-check in Manage Integration Check workflow sometime fails #8937

Closed
mhofman opened this issue Feb 16, 2024 · 3 comments
Closed

update-check in Manage Integration Check workflow sometime fails #8937

mhofman opened this issue Feb 16, 2024 · 3 comments
Assignees
Labels
bug Something isn't working test tooling repo-wide infrastructure

Comments

@mhofman
Copy link
Member

mhofman commented Feb 16, 2024

Describe the bug

The job sometimes fail, not sure why? I'm positing a sort of race where a previous create-check didn't complete yet.

Arguably the update-check should create if it doesn't exist, or retry?

The consequence is pretty bad: the integration-test-result check never completes, and the PR gets stuck.

To Reproduce

Expected behavior

Completion result of integration test workflow always fully reflected in check result.

Platform Environment

CI

Context

#8731

Screenshots

update-check
No integration-test-result check found for commit 19ee116 https://github.com/Agoric/agoric-sdk/actions/runs/7935214947

@mhofman mhofman added bug Something isn't working tooling repo-wide infrastructure test labels Feb 16, 2024
@mhofman mhofman self-assigned this Feb 16, 2024
@0xpatrickdev
Copy link
Member

https://github.com/Agoric/agoric-sdk/actions/runs/8621724430

Is this failed run something to be concerned with?

@mhofman
Copy link
Member Author

mhofman commented Apr 9, 2024

yeah it's the race i've seen before. usually re-running will fix

@frazarshad frazarshad self-assigned this Jul 26, 2024
mergify bot added a commit that referenced this issue Aug 2, 2024
…low (#9787)

refs: #8937 

## Description

Addresses the issue with flake in the `manage-integration-check.yml` workflow. This is due to a race condition where the workflow is executed 3 times in quick succession and is expected to run in order of execution. 

The "Integration tests" workflow triggers `manage-integration-check.yml`  3 times. once for `requested`, `in_progress`, and `completed` statuses of the calling workflow. 
In cases where the `completed` status might happen at the same time as the other two (such as when the "Integration tests" workflow is skipped), the 3 calls of the `manage-integration-check.yml` workflow might run simultaneously and cause a race condition.

the race condition occurs because the final call of `manage-integration-check.yml` (the one triggered by the `completed` status of "Integration tests") expects a github check run to already exist. since this might not be the case in a race condition, it fails

To fix this two things have been done:
- instead of crashing, the job now creates a new github check run with a `completed` status
- since the previous two jobs will run after the final job, they might accidentally overwrite the github check run with the `completed` status and accidentally create a check run with `in_progess` status.   their  filter check has been removed entirely so that they consider `completed` check runs as well.


### Security Considerations


### Scaling Considerations


### Documentation Considerations


### Testing Considerations


### Upgrade Considerations
@toliaqat
Copy link
Contributor

@mhofman do you think it is done in #9787 then we can close it?

@mhofman mhofman closed this as completed Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working test tooling repo-wide infrastructure
Projects
None yet
Development

No branches or pull requests

4 participants