-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Github Actions Limitations #4876
Comments
to add on some more ideas from previous discussions
|
We are using the macos environment for the element-ios builds so just turning that on (with the hardware acceleration that comes with it) seems reasonable; it's more expensive but if it works we just need to pay for it. How was it not working? Unreliable tests or timeouts or similar (do we have an example of a GHA based test that is failing atm, a lot seem to be green in the actions tab) We're also using buildkite as our main non-GHA CI tool; we could see if running on the linux instances of that might be better than the GHA ones; but they'll still (i believe) not have hardware acceleration available to them, which will mean they will likely still run slow, but we might be able to run the tests for an extended period. Last option is to use a local farm (or single machine) of real machines with hardware acceleration available to them to run these actions on as custom runners, but that's a bit of an investment. |
mainly unreliable https://github.com/vector-im/element-android/actions/workflows/sanity_test.yml these same tests pass consistently locally without issue we're also using the osx runner for the nightly UI test suite, the android emulator is notoriously picky when running headless without a gpu my recommendation would be to avoid using VMs all together and use a dedicated service like firebase test lab but it would require an externally accessible synapse instance |
Yeah; i was going to see how easy it would be to move the synapse outside of the build process first, in case the synapse itself is causing some overheads. If that works then moving further onto firebase or elsewhere would be fairly easy. |
The main problem is something like this:
This is caused mainly due to missing hardware acceleration. I believe that with iOS slaves will work much better, can you verify if we can also use macOs slaves for the android builds? If thats not the case maybe we should see for your other suggested solutions. |
https://github.com/vector-im/element-android/pull/5193/files So i tried this; which was to take the settings for the integration test (which seems to reliably start the emulator) and move them across to the sanity test (i also forced the sanity test to run each time we push to my branch, so don't merge). It seems to be OK, other than some flaky tests, I haven't seen an emulator start error here. Perhaps the problem was the android level 29 or the exact version of the pixel etc - is that something we explicitly wanted to test with or is it independent? |
Thanks for your update Michael, nice changes! Well the main issue is that there are errors that are not even persistent, so we cant rely on the results. For example in your brach here there are 3 failures ( I guess that is after your changes). The emulator error for example happens to me with about 20% in every run with the previous settings. The android api level is not that important, I tried a lot of different settings, API levels and emulator-builds to conclude using the settings we have while it produced the less errors. But still I am not sure if GHA is made for that kind of runs, maybe using Macos slaves and hardware accelerated will help Maybe we can apply your changes and check about improvements in our every day builds |
Yeah, this fixes the "it reliably starts an emulator and runs" - we need to do more changes to make it actually fail the build on a failure (for instance the integration tests also don't fail the build). I'll tidy the branch up into a real PR and offer it for review |
https://github.com/vector-im/element-android/actions/workflows/sanity_test.yml is what i was actually trying to get working, btw. I think the integration tests might fail because the synapse that |
Received this in a sanity test run:
Adding a loop around |
So there's various manifestations of the unreliability on the runners recently:
|
The emulator dying is possibly due to a CPP level failure in (eg) the realm code which causes a signal 9 which causes the emulator to stop responding mid-run, which has become visible due to the logcat logs now being visible. So it's possible that a bunch of the errors that we thought were the emulator failing mid-run, are actually the tests doing the right thing and highlighting a real code failure. |
hmm interesting, I wonder why this is not happening locally |
Following the Trying to fix integration tests PR. After the fixes. The tests can run most of the time and are published manually as a PR comment, splitting the tests to smaller chunks also helped.
Running an emulator from within linux GitHub action server has a lot of problems and limitations (on mac slave is much better)
ReactiveCircus/android-emulator-runner#62
This is a solution but not a stable one. It would be good if we can increase slaves hardware
Other than that we can use another CI/CD tool like Jenkins, specific for integration tests so the flow will be as follows:
Github Action triggers Jenkins -> Jenkins run Integrations Tests and Post results back to GitHub
The text was updated successfully, but these errors were encountered: