Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition in restoring Test Proxy assets in CI #32805

Closed
HarshaNalluru opened this issue Jan 30, 2025 · 0 comments · Fixed by #32810
Closed

Race condition in restoring Test Proxy assets in CI #32805

HarshaNalluru opened this issue Jan 30, 2025 · 0 comments · Fixed by #32810
Assignees
Labels
EngSys This issue is impacting the engineering system.

Comments

@HarshaNalluru HarshaNalluru self-assigned this Jan 30, 2025
@github-actions github-actions bot added the needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. label Jan 30, 2025
HarshaNalluru added a commit that referenced this issue Jan 30, 2025
### Description

This PR reverts the recent changes from [PR
32668](#32668) and [PR
32752](#32752).

These changes did not fully resolve the race condition issues, as we
observed continued failures, albeit with lesser frequency, even after
increasing the timeouts.

Increasing the timeouts further does not seem like a clean solution.
Therefore, reverting these changes in this PR.

### Changes

- Revert the changes introduced in PR 32668 and PR 32752.
- [TO DO] A new approach to restore assets sequentially in a separate
job will be implemented in a follow-up PR.

### Related Issue

Issues observed at the current state of restoring assets: [Issue
32805](#32805)
@xirzec xirzec added EngSys This issue is impacting the engineering system. and removed needs-triage Workflow: This is a new issue that needs to be triaged to the appropriate team. labels Feb 3, 2025
HarshaNalluru added a commit that referenced this issue Feb 3, 2025
## Issue
Fixes #32805
Related PR: #32806

The original issue was that "restoring recordings when the first test is
triggered" takes time, leading to flakily failing unit-test pipelines.
There is a need to have the recordings pre-loaded before the tests are
run in the CI pipeline.

#32806 - Earlier attempts of pre-restoring recordings in CI for playback
tests with retries and increased timeouts to avoid race conditions did
not fully solve the problem. This PR is a follow up of that.

## Description
Employs a new approach to restore assets sequentially using the
test-proxy tool, which is already downloaded by the CI pipeline.
The assets restoration process is applied only to the unit-test pipeline
in playback mode.

Updates to the `rush-runner` tool:

1. Enhanced `rushRunAllWithDirection` to handle test-proxy restore for
packages in CI pipeline.
2. Introduced new helper function: `runTestProxyRestore`.
3. Added a new helper function `spawnNodeWithOutput` to spawn NodeJS
programs and return the output.
4. Introduced `--ci` flag to help with debugging in local; calls `--ci`
flag in the yml scripts for the unit-test commands.

## Example Output
[Pipeline:
https://dev.azure.com/azure-sdk/public/_build/results?buildId=4523710&view=logs&j=45d25fdd-5540-5f10-8daf-622e80647be7&t=a7f7761c-0312-5a29-1d0c-9736ead6db1f&l=60](https://dev.azure.com/azure-sdk/public/_build/results?buildId=4529992&view=results)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
EngSys This issue is impacting the engineering system.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants