Flaky tests on Linux-aarch64 build #368

jacobperron · 2018-12-21T19:59:38Z

E.g. https://ci.ros2.org/job/ci_linux-aarch64/2467/

Seems that repeating tests is causing something to crash and not report test results.

Looks like this issue was introduced by #297.

wjwwood · 2018-12-21T22:19:00Z

Dang, I guess the CI was old, but I figured that nothing else had really changed on rviz... I'll revert it and look into the issue.

wjwwood · 2018-12-21T22:21:29Z

Reverted in #369

jacobperron · 2018-12-21T22:26:05Z

Dang, I guess the CI was old, but I figured that nothing else had really changed on rviz... I'll revert it and look into the issue.

@wjwwood
To clarify, I believe CI passes with the build flag --retest-until-pass 10. But our nightly jobs run --retest-until-fail 10, which is where the failures were originally noticed.
So our regular CI wouldn't have caught this.

cottsay · 2018-12-27T17:50:37Z

Is there anything else to be done here, or did #369 solve the issue?

wjwwood · 2019-01-02T23:03:03Z

I was leaving this open until the revert could be undone, but I guess we could close this if desired since it only really applies to fixing the CI, not also fixing the originating change and getting it re-merged.

Martin-Idel · 2019-01-04T19:24:09Z

The revert will definitely not have changed anything. The commit only introduced changes in rviz_default_plugins, while the test failures start with tests in rviz_rendering (e.g. point_cloud_renderable_test_target, which is built before rviz_default_plugins.

I recall that we had the same test failures in the nightlies when enabling display tests on Linux systems back in July and I don't remember them ever having been resolved. I tried to debug the issue once but couldn't reproduce - but it seems to have been Ogre not coming up successfully...

jacobperron · 2019-01-07T18:49:35Z

There has been a slew of test failures in the nightlies over the past few days for rviz_default_plugins and rviz_rendering: https://ci.ros2.org/view/nightly/job/nightly_linux-aarch64_repeated/655/
The symptoms are missing test results.
@Martin-Idel Sounds like the same issues you experienced in the past?

wjwwood · 2019-01-07T19:13:57Z

I was working under an assumption that @jacobperron saw these as new flaky tests, indicating this was true:

Looks like this issue was introduced by #297.

If the flaky-ness was preexisting then we could undo the revert and address the flaky tests separately. I didn't investigate deeply. I was just trying to keep CI clean leading up to the release.

andreasholzner · 2019-01-08T10:35:17Z

I am sure that the flaky aarch64 tests are not related to #297 as we have seen those also in the past. We had no direct access to aarch64 machines and thus could not debug as these problems do not show on amd64.

clalancette · 2019-01-08T22:26:22Z

@andreasholzner If we can get you access to an aarch64 machine (probably in AWS), would you or @Martin-Idel have time to look into this?

wjwwood · 2019-01-14T23:47:14Z

So it seems like the changes in #297 were not the cause of this? Does it just make it more flaky? I'm about to do the release for Crystal patch 1 and need to resolve this...

jacobperron · 2019-01-14T23:52:59Z

So it seems like the changes in #297 were not the cause of this? Does it just make it more flaky? I'm about to do the release for Crystal patch 1 and need to resolve this

It does seem that they are not the cause. I wasn't aware of the previous failures when I originally reported. I can't say that it makes it more flakey. I think I opened this issue after seeing the failures for two or three days. I'd say it would be okay to add the change back.

jacobperron · 2019-01-14T23:57:01Z

Here's the (original?) issue I missed: ros2/build_farmer#144

andreasholzner · 2019-01-18T22:10:39Z

I could (almost) reproduce the test results, however, the console output is a little different.

Output seen while trying to reproduce

Running main() from gmock_main.cc
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from PointCloudRenderableTestFixture
[rviz_rendering:debug] Available Renderers(1): OpenGL Rendering Subsystem, at /home/ubuntu/rviz_ws/src/rviz/rviz_rendering/src/rviz_rendering/render_system.cpp:273

The line starting with [rviz_rendering:debug] is not present in the failing ci builds. In both cases the tests fail with a segfault.

I could trace the segfault to the call to Ogre::Root::createRenderWindow.

rviz/rviz_rendering/src/rviz_rendering/render_system.cpp

Line 529 in 069eb10

window = ogre_root_->createRenderWindow(name, width, height, false, params);

The call stack is not helpful.

(gdb) bt
#0 0x0000ffffbc77e670 in ?? () from /usr/lib/aarch64-linux-gnu/libGLX.so.0
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

I am out of ideas. Maybe a Ogre upgrade could help, but this would require some work. During a quick try to use the Ogre 1.11.5 the rviz_ogre_vendor package did not compile.

dirk-thomas · 2019-03-14T22:57:57Z

Is there any chance that these test failure will be addressed in the near future? If not I would propose to exclude the affected packages (rviz_common, rviz_default_plugins, rviz_rendering, rviz_rendering_tests) from the nightly aarch64 repeated job.

andreasholzner · 2019-03-15T08:37:23Z

I don't see how so I am in favor of excluding the packages from the nightly job.
After an Ogre upgrade they could be enabled again.

Martin-Idel · 2019-10-17T18:04:32Z

This should have been solved by #394 (at least that's what my test showed there). Are the tests reenabled? Could somebody try it again?

clalancette · 2021-05-03T20:23:52Z

I just ran a repeated job with this stuff reenabled: https://ci.ros2.org/job/ci_linux-aarch64/9301/. While there are test failures, none of them are in rviz-related components, so this is probably fixed. I'm going to close this out and open a PR to reenable these tests on the nightlies.

jacobperron added the bug Something isn't working label Dec 21, 2018

wjwwood mentioned this issue Dec 21, 2018

Revert "Visibility followup for marker" #369

Merged

wjwwood mentioned this issue Jan 14, 2019

Revert "Revert "Visibility followup for marker"" #373

Merged

Martin-Idel mentioned this issue Apr 20, 2019

Upgrade from Ogre 1.10 to Ogre 1.12.1 #394

Merged

clalancette closed this as completed May 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flaky tests on Linux-aarch64 build #368

Flaky tests on Linux-aarch64 build #368

jacobperron commented Dec 21, 2018

wjwwood commented Dec 21, 2018

wjwwood commented Dec 21, 2018

jacobperron commented Dec 21, 2018 •

edited

Loading

cottsay commented Dec 27, 2018 •

edited

Loading

wjwwood commented Jan 2, 2019

Martin-Idel commented Jan 4, 2019

jacobperron commented Jan 7, 2019

wjwwood commented Jan 7, 2019

andreasholzner commented Jan 8, 2019

clalancette commented Jan 8, 2019

wjwwood commented Jan 14, 2019

jacobperron commented Jan 14, 2019

jacobperron commented Jan 14, 2019

andreasholzner commented Jan 18, 2019

dirk-thomas commented Mar 14, 2019

andreasholzner commented Mar 15, 2019

Martin-Idel commented Oct 17, 2019

clalancette commented May 3, 2021

Flaky tests on Linux-aarch64 build #368

Flaky tests on Linux-aarch64 build #368

Comments

jacobperron commented Dec 21, 2018

wjwwood commented Dec 21, 2018

wjwwood commented Dec 21, 2018

jacobperron commented Dec 21, 2018 • edited Loading

cottsay commented Dec 27, 2018 • edited Loading

wjwwood commented Jan 2, 2019

Martin-Idel commented Jan 4, 2019

jacobperron commented Jan 7, 2019

wjwwood commented Jan 7, 2019

andreasholzner commented Jan 8, 2019

clalancette commented Jan 8, 2019

wjwwood commented Jan 14, 2019

jacobperron commented Jan 14, 2019

jacobperron commented Jan 14, 2019

andreasholzner commented Jan 18, 2019

Output seen while trying to reproduce

dirk-thomas commented Mar 14, 2019

andreasholzner commented Mar 15, 2019

Martin-Idel commented Oct 17, 2019

clalancette commented May 3, 2021

jacobperron commented Dec 21, 2018 •

edited

Loading

cottsay commented Dec 27, 2018 •

edited

Loading