Intermittent test failures in CI #679

asomers · 2017-07-16T19:57:48Z

The following tests have been observed to fail occasionally in CI. If you find a fix for one of them, check it off:

sys::test_aio::test_aio_cancel_all times out on powerpc-unknown-linux-gnu (https://travis-ci.org/nix-rust/nix/jobs/252267360 https://travis-ci.org/nix-rust/nix/jobs/251497766 https://travis-ci.org/nix-rust/nix/jobs/253133267)
errno::test::test_errno_values times out on powerpc-unknown-linux-gnu (https://travis-ci.org/nix-rust/nix/jobs/252157518 https://travis-ci.org/nix-rust/nix/jobs/249711100)
test_unistd::test_wait times out on x86_64-unknown-linux-gnu (https://travis-ci.org/nix-rust/nix/jobs/251440414 PR Make aio, chdir, and wait tests thread safe #638 might fix this one)
sys::test_aio::test_aio_cancel_all panics in thread_info.rs on powerpc-unknown-linux-gnu (https://travis-ci.org/nix-rust/nix/jobs/249502626)
test_double_close times out on powerpc-unknown-linux-gnu (https://travis-ci.org/nix-rust/nix/jobs/254249145 https://travis-ci.org/nix-rust/nix/jobs/255050223)
sys::test_epoll::test_epoll_ctl panics when unwrapping Error::Sys(UnknownErrno) on x86_64-unknown-linux-gnu (https://travis-ci.org/nix-rust/nix/jobs/254999411)
sys::test_epoll::test_epoll_errno panics when unwrapping Error::Sys(UnknownErrno) on x86_64-unknown-linux-gnu (https://travis-ci.org/nix-rust/nix/jobs/254999411)
error: component download failed for rust-std-armv7-unknown-linux-gnueabihf on armv7-unknown-linux-gnueabihf (https://travis-ci.org/nix-rust/nix/jobs/254708971)

The text was updated successfully, but these errors were encountered:

Susurrus · 2017-07-19T03:16:31Z

I'm going to work on the epoll ones. I think I know how to solve them, you can create a signalfd to then call epoll on and we should be able to consistent results.

Susurrus · 2017-07-19T03:41:49Z

See #689 for a potential fix to the epoll errors.

asomers · 2017-07-20T03:53:28Z

I checked off everything that was disabled by #689. I want to wait a few more days to see if test_wait rears its ugly head again. And the component download failed thing actually sounds like a server availability issue.

Susurrus · 2017-07-20T05:33:01Z

Yeah, but we probably want to add an automatic retry mechanism to cross, rustup, or cargo to try it 3 different times and only bail out then. It's annoying that we need to deal with those kinds of failures when those tools should handle it gracefully.

jonas-schievink · 2017-07-20T21:41:18Z

test_select just failed for me on x86-64 Linux after seemingly waiting on a timeout:

thread 'sys::test_select::test_select' panicked at 'assertion failed: `(left == right)` (left: `1`, right: `0`)', test/sys/test_select.rs:52

AFAIK, there's no guarantee that file descriptors are allocated monotonically, possibly the r2 + 1 is at fault and should be replaced by cmp::max(r1, r2) + 1?

Susurrus · 2017-07-20T22:10:51Z

Yeah, that seems right to me. Additionally that value should not even be provided by the user to our version of select. We should provide an API that calculates that value for you so the user doesn't make a mistake like this.

asomers · 2017-07-21T00:02:48Z

Actually, there is a guarantee that file descriptors be allocated monotonically: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_14 . The problem is that this one particular test isn't thread safe: it can fail like this:

Some other thread opens a file and gets file descriptor 3
test_select calls the first pipe and gets file descriptors 4 and 5
the first thread closes file descriptor 3
test_select calls the second pipe and gets file descriptors 3 and 6
select times out, because we calculated nfds wrong. The simplest fix would just be to pass max(r1, r2) + 1 to select.

701: Calculate `nfds` parameter for `select` r=asomers Doing this behind the scenes makes the API less error-prone and easier to use. It should also fix my issue in #679 (comment)

asomers · 2017-11-05T17:22:42Z

I haven't seen any intermittent failures for a few months now, so I'm going to close the issue. If any new intermittent failures pop up in the future, open a new issue.

Susurrus added the A-testing label Jul 16, 2017

Susurrus mentioned this issue Jul 20, 2017

test_select sporadically fails on x86-64 Linux #698

Closed

jonas-schievink mentioned this issue Jul 20, 2017

Calculate nfds parameter for select #701

Merged

bors bot added a commit that referenced this issue Aug 11, 2017

Merge #701

7324888

701: Calculate `nfds` parameter for `select` r=asomers Doing this behind the scenes makes the API less error-prone and easier to use. It should also fix my issue in #679 (comment)

asomers closed this as completed Nov 5, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intermittent test failures in CI #679

Intermittent test failures in CI #679

asomers commented Jul 16, 2017 •

edited

Loading

Susurrus commented Jul 19, 2017

Susurrus commented Jul 19, 2017

asomers commented Jul 20, 2017

Susurrus commented Jul 20, 2017

jonas-schievink commented Jul 20, 2017

Susurrus commented Jul 20, 2017

asomers commented Jul 21, 2017

asomers commented Nov 5, 2017

Intermittent test failures in CI #679

Intermittent test failures in CI #679

Comments

asomers commented Jul 16, 2017 • edited Loading

Susurrus commented Jul 19, 2017

Susurrus commented Jul 19, 2017

asomers commented Jul 20, 2017

Susurrus commented Jul 20, 2017

jonas-schievink commented Jul 20, 2017

Susurrus commented Jul 20, 2017

asomers commented Jul 21, 2017

asomers commented Nov 5, 2017

asomers commented Jul 16, 2017 •

edited

Loading