Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout starts counting before apk and instrumentation has been installed #1194

Closed
msand opened this issue Mar 5, 2019 · 19 comments
Closed

Comments

@msand
Copy link
Contributor

msand commented Mar 5, 2019

Description

Timeout starts counting before apk (and instrumentation) has been installed, causing the first attempt at testing e2e android on travis to fail (I haven't got bitwise to work properly with android e2e a single time yet). Running a second time allows the tests to finish on time, at least sometimes, it seems relatively flaky.

Ran into something like 20 different error messages on the way, most of them having a closed issue here on the repo with the conversations locked (not very helpful). In the end I managed to find a way to make it build, by using the debug build of the private travis-ci.com, ssh into the machine, logcat android in one half of the screen and running detox test repeatedly after giving the emulator enough time to start and stabilize. After this I had to find suitable timeouts, which weren't too big for travis to think it had frozen, and allow the second run to finish.

Also, a lot of different errors were avoidable by adding a 60 second sleep after the android-wait-for-emulator command, all kinds of service and permission errors show up even if the wait command finishes successfully.

A history of the 100ish attempts to make this work can be found from the public travis: https://travis-ci.org/msand/react-native-svg-e2e/builds
And another 150ish attempts in bitrise: https://app.bitrise.io/app/1a150b039e9b8fbd#/builds
Would probably be useful to collect the various error messages and resolutions into documentation somewhere.

Testing on iOS works good (available in the primary bitrise workflow), I even have it taking screenshots on bitwise. On travis, iOS works fine with the example project, creating a new project, installing react-native-svg (the system under test), and building (haven't tried making screenshots on travis yet). And another new project stage, using cocoapods, installing, and, building. This issue is specifically for the android workflow in bitrise, and the android "build test for example" on travis.

Also, building, testing and taking screenshots locally works fine with both Android and iOS.

Ideally, detox would wait for adb to install the packages before starting the timeout, or even have a separate timeout for this. And, something which ensures output at least once per 10 minutes while waiting (not to make travis believe it has crashed).

Steps to Reproduce

Fork https://github.com/msand/react-native-svg-e2e
Add it to travis
Add it to bitrise
Trigger build(s)

Detox, Node, Device, Xcode and macOS Versions

  • Detox: 10.0.12
  • React Native: 0.58.6 (most recent stable release)
  • Node: 11.10.1 (most recent node from nvm)
  • Device: android emulator canary https://dl.google.com/android/repository/emulator-linux-5329922.zip (latest stable behaves similar, but canary should be faster/more efficient)
    • Travis:
      • Android linux image with oraclejdk8, system-images;android-21;google_apis;armeabi-v7a
    • Bitrise
      • Android Ubuntu 16.04, system-images;android-28;google_apis;x86

Device and verbose Detox logs

https://travis-ci.org/msand/react-native-svg-e2e/jobs/501950974
https://app.bitrise.io/build/42e2db33ffa6485c

@msand
Copy link
Contributor Author

msand commented Mar 5, 2019

I haven't looked much at the code in detox yet. But I'm thinking, perhaps this would be cleanest to solve by splitting the logic for installing the apk and instrumentation into a separate step. Such that when it finishes, one can safely call detox with the --reuse flag. And perhaps a separate command to check that the app and the instrumentation has started correctly and warmed up for testing.

@msand
Copy link
Contributor Author

msand commented Mar 5, 2019

Is there any existing example of detox e2e testing working with android on bitrise?

@msand
Copy link
Contributor Author

msand commented Mar 6, 2019

Managed to get x86 android on bitrise now as well: https://app.bitrise.io/build/c31c1a40907cd137
Using system-images;android-28;google_apis;x86
Latest android emulator canary with using emulator-headless -no-accel
Using default "Hybrid (Xamarin, Ionic, ...) on macOS, includes Visual Studio for Mac, Stable channel"

@msand
Copy link
Contributor Author

msand commented Mar 11, 2019

I'm still having some flakyness with android on travis: https://s3.console.aws.amazon.com/s3/buckets/react-native-svg-e2e/msand/react-native-svg-e2e/117/117.2/android.emu.release.2019-03-11%252015-59-50Z/?region=us-east-1&tab=overview
Relatively common to get "Unfortunately, Google Play Games has stopped." in a modal with an "OK" button. I found I can select the OK button using adb shell input keyevent 61 and click it using adb shell input keyevent 66, so I've added two steps before running detox:

        - adb shell dumpsys window windows | grep -E 'Not Responding' && adb shell input keyevent 61  && adb shell input keyevent 61  && adb shell input keyevent 66 || true
        - adb shell dumpsys window windows | grep -E 'has stopped' && adb shell input keyevent 61 && adb shell input keyevent 66 || true

I've also changed to using travis_wait for the long running install of the apk, and give --reuse to detox:

        - adb push android/app/build/outputs/apk/release/app-release.apk /data/local/tmp
        - travis_wait adb shell pm install /data/local/tmp/app-release.apk
        - adb shell rm /data/local/tmp/app-release.apk
        - adb push android/app/build/outputs/apk/androidTest/release/app-release-androidTest.apk /data/local/tmp
        - adb shell pm install /data/local/tmp/app-release-androidTest.apk
        - adb shell rm /data/local/tmp/app-release-androidTest.apk

But sometimes I still get this in some tests in the middle of the run:

image

So, I've patched the screenshot plugin to run this on start:

        const content = await adb.shell(deviceId, 'dumpsys window windows');
        if (content && content.indexOf('Application Error: com.google.android.play.games') !== -1) {
          await adb._sendKeyEvent(deviceId, 61);
          await adb._sendKeyEvent(deviceId, 66);
        }

msand/detox-patch-build@8a9c562
Update: msand/detox-patch-build@e764d3c

And hoping it would resolve the issue for good. @d4vidi Would you be interested in a PR for this?

@msand
Copy link
Contributor Author

msand commented Mar 11, 2019

I guess the assertions are failing because mCurrentFocus is in the Application error window saying that Google Play Games has stopped, thus it thinks that the elements aren't visible. Perhaps there should be some better logic for detecting if system services have failed, and are causing false negatives because of showing modals. Such as querying mCurrentFocus and making sure it's in the expected window before making assertions about visibility.

@msand
Copy link
Contributor Author

msand commented Mar 11, 2019

Seems I hadn't tested the logic properly, new patch looks for Application Error: com.google.android.play.games instead of has stopped

@msand
Copy link
Contributor Author

msand commented Mar 11, 2019

This seems to have resolved the issue: https://travis-ci.org/msand/react-native-svg-e2e/jobs/504814138
You can see from the logs that the modal was present (and got dismissed), and that the tests passed correctly.

@d4vidi
Copy link
Collaborator

d4vidi commented Mar 12, 2019

@msand first thing's first, thanks for this highly detailed report!
Second, it troubles me to see that the integration process hasn't been smooth for you. I really want this to be better, and so I'd be happy if you better explain which exact timeout you're referring to.

In essence, detox test should take of bringing up the device and installing the app for you, and it should feel smooth. As a side note, I'd say that on CI it's recommended that you'd have a saved snapshot of the emulator set up so it wouldn't have to cold start each time.

@msand
Copy link
Contributor Author

msand commented Mar 16, 2019

Well I'm happy I got it working at least. It was quite a ride in debugging CI land 😉 learned a lot in the process. Detox seems very nice otherwise, thanks for your work on it!
Seems we could improve it quite a lot by detecting the case that focus is outside the app, and even more if it could detect various os related errors and report them, or even deal with them.

There are two timeouts: One from the testing system, jest in this case, which times out when installing the apk takes too long. Timeout set here: https://github.com/msand/react-native-svg-e2e/blob/master/e2e/init.js#L12

The other one is from the CI system itself, which was mostly an issue in travis, even when just installing the app apk to the emulator as I do now to avoid the jest timeout, before calling detox with --reuse. My current workaround is the travis_wait command, which is intended for long-running steps which produce no output, but still isn't a sign of hanging. Travis has a default timeout of 10 minutes, the travis_wait makes it 20 by default and can be changed with an optional parameter. This might be good to add to documentation, for using detox with android on ci.

Also, the additional sleep after the wait_for_emulator command, I suspect it's a flaw with the wait command, it should look for something else to decide if the emulator is ready to get commands / have apks installed to it / e2e run on it, or detox could retry if it fails. Alternatively, could perhaps add to the documentation a suggestion to add a sleep command, but doesn't feel completely clean. Or as you suggested now, to use a saved snapshot of the emulator which has had any initial setup needed already done and given enough time to stabilize. And, perhaps even install some trivial/minimal apk to it, to verify that it works in isolation. But, correct detection of emulator readiness would be ideal.

@msand
Copy link
Contributor Author

msand commented Mar 17, 2019

I'm thinking something like this might be a better wait_for_emulator command:

adb wait-for-device shell getprop
adb shell '
if [[ $(getprop | grep dev.bootcomplete) ]]
 then while [[ -z $(getprop dev.bootcomplete) ]] do
  echo dev.bootcomplete: $(getprop dev.bootcomplete);
  sleep 1;
 done;
 echo dev.bootcomplete: $(getprop dev.bootcomplete);
else
 echo no dev.bootcomplete property;
fi

if [[ $(getprop | grep sys.boot_completed) ]]
 then while [[ -z $(getprop sys.boot_completed) ]] do
  echo sys.boot_completed: $(getprop sys.boot_completed);
  sleep 1;
 done;
 echo sys.boot_completed: $(getprop sys.boot_completed);
else
 echo no sys.boot_completed property;
fi

if [[ $(getprop | grep service.bootanim.exit) ]]
 then while [[ -z $(getprop service.bootanim.exit) ]] do
  echo service.bootanim.exit: $(getprop service.bootanim.exit);
  sleep 1;
 done;
 echo service.bootanim.exit: $(getprop service.bootanim.exit);
else
 echo no service.bootanim.exit property;
fi

if [[ $(getprop | grep init.svc.bootanim) ]]
 then while [[ $(getprop init.svc.bootanim) != 'stopped' ]] do
  echo init.svc.bootanim: $(getprop init.svc.bootanim);
  sleep 1;
 done;
 echo init.svc.bootanim: $(getprop init.svc.bootanim);
else
 echo no init.svc.bootanim property;
fi
'

This should be sufficient to resolve the issues from the fact that install requires the Android package manager, which is available only after the system is fully booted. I think this should allow removing or at least decreasing the extra sleep I'm currently using after the wait command. Similar commands seem to be used in chromiumos and fastlane
https://chromium-review.googlesource.com/c/chromiumos/third_party/autotest/+/350770/2/server/hosts/adb_host.py#519
https://github.com/AzimoLabs/fastlane-plugin-automated-test-emulator-run/blob/master/lib/fastlane/plugin/automated_test_emulator_run/actions/automated_test_emulator_run_action.rb

Also, it seems it might be more reliable not to use -no-boot-anim for the getprop service.bootanim.exit / getprop init.svc.bootanim to be a good indicator of it having started properly:
https://issues.jenkins-ci.org/browse/JENKINS-27702

@d4vidi d4vidi self-assigned this Mar 18, 2019
@d4vidi
Copy link
Collaborator

d4vidi commented Mar 18, 2019

To try to narrow it down, as far as Detox is concerned, it would be better if the jest/mocha timeout would not include the time it takes for the emulator to start and for the app installation to complete. Is that the problem at its core?

@msand
Copy link
Contributor Author

msand commented Mar 18, 2019

Well, essentially it's three separate issues I've had to work around. One to get the emulator to a state where it's safe to install the apk. One to handle that the install takes too long. And one to handle that the google services crash and takes the focus away from the app.

@noomorph
Copy link
Collaborator

Good observations. I'll make the issue pinned till we decide what to do about the things you've noticed.

@LeoNatan LeoNatan changed the title Timeout starts counting before apk and instrumentation has been installed [android] Timeout starts counting before apk and instrumentation has been installed Jun 30, 2019
@noomorph
Copy link
Collaborator

noomorph commented Dec 17, 2019

We might add adapter.setTimeout(120000) solution for Jest, in which the timeout will be increased by time delta it took to run the entire detox.init() function, so at the end it is going to be jest.setTimeout(120000 + 87384), where 87384 is timestampEnd - timestampStart.

To restore the original 120000 timeout in that example, we might use afterEach internal hook in the adapter implementation.

@d4vidi
Copy link
Collaborator

d4vidi commented Dec 17, 2019

@noomorph adapter.setTimeout() sounds great, but the timeout can expire before detox.init() ends (it's actually a quite reasonable scenario). I can suggest to (alongside an adapter.setTimeout()) increase the timeout dynamically in case init() finds that a device boot is required. This will be done via a joint wrapper (used both by the adapter and the device abstraction). Seeing that typically this would be done only in the runs of up to N initial tests (N = maxWorkers), it would be reasonable to avoid timeout restoration altogether IMO.

...Note that sooner than later, however, we may find ourselves in the need of introducing an optional deviceBootTimeout property in the global configuration.

@noomorph
Copy link
Collaborator

Well, the idea with deviceBootTimeout seems reasonable.
Do you think it could look like this or you have another idea?

adapter.setTimeout({
  testRun: 120000,
  deviceBoot: 360000,
});

@d4vidi
Copy link
Collaborator

d4vidi commented Dec 19, 2019

LGTM!

@d4vidi
Copy link
Collaborator

d4vidi commented Dec 31, 2019

Well I think the time has finally come for me to be able to address all 3 issues/concerns mentioned here. Here goes.

Separate install/boot timeouts

I've done some work (and research) as we discussed, aiming at solving this. The code is currently in a branch. Unfortunately, the approach we agreed upon failed, simply because it hits Jest's limitation of that a timeout cannot be set dynamically from inside an async handler in the context of beforeAll() -- Jest simply ignores it (sigh). We should consider reporting an issue. I've reported this in jestjs/jest#9359.

Generally speaking, after running quite a few tests I can conclude that the effect timeout-setup has can be hard to determine with Jest. It sometimes has a global effect, sometime a scoped effect, and - as explained, sometimes no effect at all. The very laconic documentation of Jest.setTimeout() doesn't help, either.
Hence the bottom line is that we cannot provide the perfect solution here, namely, have the user define a base timeout and an additional increment for when a device is booted, and apply the latter only when specifically necessary.

However, a proximate solution is nevertheless very easy to implement! What it takes in order to apply in one's project is:

  1. Moving to Jest (that's a given, because Mocha's timeout is static).
  2. Updating the e2e init script, typically init.js, this way:
jest.setTimeout(90000);
beforeAll(async () => {
  await detox.init(config);
  // other initializations
}, 300000);

With this init code, the test-scope timeout (unless specified differently) will be 1.5min, and a longer, 5 min timeout will only be applied for the init phase (which includes instrumentation init and potentially app install and emulator/simulator boot).

Currently, this is the best I can offer. The main downside here of course is that the 5min timeout is always applied, even when device boot isn't invoked.

Window focus / ANR's on Android

We've been struggling with similar issues ourselves, over time, in the context of Detox CI. The best I can suggest is to use a clean emulator, with no google-api's installed. For example, instead of using this bundle when creating the AVD: system-images;android-28;google_apis;x86_64, use: system-images;android-28;default;x86_64.

Google API's, alongside the gboard keyboard, seem to introduce numerous instabilities, which are solved otherwise. ANR's (and the resulting app defocusing) appear to be one of them.

CI providers (Travis-CI, Bitrise, Circle-CI)

Here too, we've had our fair share of giving those a try. We've put it serious efforts into making them work for the Detox CI suite -- all of which have failed. Among other things, we've tackled ANR dismissals, as mentioned in this issue. Eventually, we gave up, so we really can't give a good advice on the matter from where we're standing. One additional direction to explore, however, is Github Actions.

@lock lock bot locked as resolved and limited conversation to collaborators Jan 4, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants