-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TEST][FLAKY] test_arm_compute_lib #8417
Comments
Perhaps this a flaky CI machine that is causing this occasionally failing not too dissimilar to how certain other tests fail in ci-i386 ? |
I am not too sure about the cause, but this case it seems to be more frequent than the i386 ones |
Note the CI for AArch64 is using images built from source. We have a docker image update for ci_arm which uses pre-built binaries for the binaries which is due and we are waiting for it for the past many weeks. I haven't seen any of these failures occur in the gazillion times I've run this with the latest Dockerfile.ci_arm image baked by myself locally. Perhaps we can look to move that ahead and see how it runs in the CI and if we have similar flakiness ? |
Yah, let us do that. However, the CI Jenkinsfile will only take in effect in main, so perhaps a good first step is to first cleanup the compute_lib.sh so we unblock others, then upgrade the image. @leandron should be able to push to the branch ci-docker-staging to testout the new image |
I think that's already done by @areusch and @mbrookhart as part of their work in updating the ci images in #8177 . |
I've updated #8400 as suggested till we figure this out - is there a way to access the CI machine to debug interactively what is going on ? I am unable to reproduce the issue at all at my end having tried it on quite a few AArch64 linux boxes I have control of. |
@u99127 do you have either a) a packer build flow for the CI ARM machine or b) a suggested AMI or recipe for building the CI machine? my understanding is that the ARM machines use an image we built in-house, and it would be great to just document the build process. |
On the machines I have access to I'm using bog standard 18.04 ubuntu + the docker image baked from the Docker scripts . On your query about the machine possibly @zhiics might be able to help ? Ramana |
Ok I've tried this for quite a few times this evening after having acquired access to an m6g4xlarge instance - not sure if this is the same as CI
for i in {1..25} ; do ./tests/scripts/task_python_arm_compute_library.sh ; done (obviously I had re-enabled the local testing in my tree) Now that the ci-arm image has been updated , I think we should try and re-enable this testing and see how it goes. Thoughts , @areusch ? Ramana |
I agree let us re-enable the tests and see how they do in CI with 0.06 |
ci image v0.06 does not appear to have the flakiness shown in ci image v0.05. However what changed between the 2 remains a mystery and needs further debugging. However for now re-enable this to see how this fares in CI Fixes apache#8417
ci image v0.06 does not appear to have the flakiness shown in ci image v0.05. However what changed between the 2 remains a mystery and needs further debugging. However for now re-enable this to see how this fares in CI Fixes #8417
ci image v0.06 does not appear to have the flakiness shown in ci image v0.05. However what changed between the 2 remains a mystery and needs further debugging. However for now re-enable this to see how this fares in CI Fixes apache#8417
ci image v0.06 does not appear to have the flakiness shown in ci image v0.05. However what changed between the 2 remains a mystery and needs further debugging. However for now re-enable this to see how this fares in CI Fixes apache#8417
https://ci.tlcpack.ai/job/tvm/job/main/1262/execution/node/352/log/
The text was updated successfully, but these errors were encountered: