Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Run testbed for Android on Linux #2230

Merged
merged 2 commits into from
Nov 16, 2023

Conversation

rmartin16
Copy link
Member

@rmartin16 rmartin16 commented Nov 15, 2023

Changes

  • Runs the Android testbed in Linux instead of macOS
  • The emulator runs about 2x faster on Linux....at least now that GitHub has sanctioned access to KVM

Notes

  • In preparing to add the ability to run the helloworld apps in app-build-verify.yml, I got this working and figured I share here...especially since Toga's CI has been serving as inspiration
  • No worries if you've got other reasons to want to stay with macOS testing

PR Checklist:

  • All new features have been tested
  • All new features have been documented
  • I have read the CONTRIBUTING.md file
  • I will abide by the code of conduct

@rmartin16 rmartin16 force-pushed the testbed-android-on-linux branch from 55b05ea to 68b52ad Compare November 15, 2023 00:32
@rmartin16
Copy link
Member Author

I see their blog post says "larger hosted runners"....but I haven't seen any problems with the testing I've done today....

@rmartin16 rmartin16 force-pushed the testbed-android-on-linux branch 3 times, most recently from 329bc77 to a542570 Compare November 15, 2023 01:42
@rmartin16
Copy link
Member Author

rmartin16 commented Nov 15, 2023

CI is so much fun 🙃 looks like KVM isn't installed on all runners...so, installing it first...

@freakboy3742
Copy link
Member

Oh HELL YES. Plug this directly into my VEINS.

It looks like it's passed CI; is there some other concern you had for this to still be a draft?

@rmartin16
Copy link
Member Author

rmartin16 commented Nov 15, 2023

haha

It looks like it's passed CI; is there some other concern you had for this to still be a draft?

yeah....I've been re-running the Android testbed job every so often tonight....and sometimes it fails because /dev/kvm doesn't exist. I'm guessing that the randomly assigned runners are on physically different hardware and nested virtualization just isn't available there.....but I'm not sure yet...

[edit] I also don't think installing qemu-kvm is doing anything useful....since whether /dev/kvm doesn't seem to depend on qemu-kvm being installed.

example failures:
https://github.com/beeware/toga/actions/runs/6878772555/job/18712129702

@rmartin16 rmartin16 force-pushed the testbed-android-on-linux branch 8 times, most recently from 93c5ace to 2cea015 Compare November 15, 2023 20:46
@rmartin16
Copy link
Member Author

So, I've been experimenting with this throughout the day. Ultimately, it does seem as though a runner is assigned that is not exposing /dev/kvm/ to the testing environment a small minority of the time. When this occurs simply restarting the job has always been sufficient enough to get a runner that does on the re-run. Although, notably, non-members do not have this privilege.

So...I guess the question is do we want to introduce this volatility in to CI for this small gain?

We'll also need to answer this when we add the ability to run the helloworld apps to app-build-verify.yml. I've subscribed myself to ReactiveCircus/android-emulator-runner#46 since this team has also been assessing this issue.

@freakboy3742
Copy link
Member

So...I guess the question is do we want to introduce this volatility in to CI for this small gain?

How small is the "small minority"? Looking at the history of runs on this PR, you've done 25 passes, doing 40 builds each. and only 1 of those passes has failed; it's not clear how many of those builds actually failed, and how many were killed because of another failure - but it looks to be no more than 3. If the failure rate is 1 in 1000... or even 1 in 300... I think we can live with that. Dropping the testing time by a factor of 5 is a much bigger win than the minor inconvenience of a 1 in a 300 failure.

As for restarting runs - we already have intermittent failures, and while it's inconvenient, it's not that inconvenient; this is one more possible intermittent failure. It's worth adding a note in the dev guide that this happens sometimes.

It's also worth noting that the faster run means the Android logs have a lot less of the Android log noise complaining about the fact that the tests are running slow, which will likely resolve some of the other intermittent failures we're seeing.

@rmartin16
Copy link
Member Author

rmartin16 commented Nov 16, 2023

How small is the "small minority"?

Hard to say without a better understanding of the distribution of failures. If the distribution is uniform, then today's testing suggests they are quite rare.....but I doubt their distribution is uniform. I saw more failures just messing around last night than I did all day today....so, some confluence of unknown factors likely impacts the likelihood of a failure and may demonstrate a distribution with sustained higher likelihoods of failures over certain periods of time.

At the end of the day, though, this change is trivial to revert.....so, I'm game to see how it goes if you are.

@rmartin16 rmartin16 force-pushed the testbed-android-on-linux branch 2 times, most recently from 8336202 to 54bde24 Compare November 16, 2023 00:18
@rmartin16 rmartin16 force-pushed the testbed-android-on-linux branch from 54bde24 to c715416 Compare November 16, 2023 00:19
@rmartin16 rmartin16 marked this pull request as ready for review November 16, 2023 01:17
@rmartin16
Copy link
Member Author

I left in a small piece of debug code to drop some breadcrumbs if/when this fails in the future.

My hypothesis is KVM is always available on GitHub's 4-core runners....but not on their 2-core runners. Furthermore I think that's what driving the "minority of runners" situation here....I think they may be phasing out 2-core runners...or something...

Copy link
Member

@freakboy3742 freakboy3742 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, I'm happy to try this as an experiment. I'll wait to see if @mhsmith has any concerns, but otherwise, I'm happy to merge this.

@freakboy3742 freakboy3742 requested a review from mhsmith November 16, 2023 03:31
.github/workflows/ci.yml Outdated Show resolved Hide resolved
Copy link
Member

@mhsmith mhsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much; this has been the biggest problem with Toga CI for a long time.

@mhsmith mhsmith merged commit 001bddd into beeware:main Nov 16, 2023
13 of 14 checks passed
@rmartin16 rmartin16 deleted the testbed-android-on-linux branch November 16, 2023 13:57
@rmartin16
Copy link
Member Author

for posterity:

From now on, any Linux or Windows workflow triggered from a public repository, using GitHub’s default labels, will run on our faster, more powerful 4-vCPU runners.

https://github.blog/2024-01-17-github-hosted-runners-double-the-power-for-open-source/

fwiw, I felt that GitHub had instrumentation in place detecting repos using KVM and would quietly promote them to the larger runners....if only because, at first, there were failures but they all of sudden dissipated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants