Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent test failures in CI #679

Closed
6 of 8 tasks
asomers opened this issue Jul 16, 2017 · 8 comments
Closed
6 of 8 tasks

Intermittent test failures in CI #679

asomers opened this issue Jul 16, 2017 · 8 comments

Comments

@asomers
Copy link
Member

asomers commented Jul 16, 2017

The following tests have been observed to fail occasionally in CI. If you find a fix for one of them, check it off:

@Susurrus
Copy link
Contributor

I'm going to work on the epoll ones. I think I know how to solve them, you can create a signalfd to then call epoll on and we should be able to consistent results.

@Susurrus
Copy link
Contributor

See #689 for a potential fix to the epoll errors.

@asomers
Copy link
Member Author

asomers commented Jul 20, 2017

I checked off everything that was disabled by #689. I want to wait a few more days to see if test_wait rears its ugly head again. And the component download failed thing actually sounds like a server availability issue.

@Susurrus
Copy link
Contributor

Yeah, but we probably want to add an automatic retry mechanism to cross, rustup, or cargo to try it 3 different times and only bail out then. It's annoying that we need to deal with those kinds of failures when those tools should handle it gracefully.

@jonas-schievink
Copy link
Contributor

test_select just failed for me on x86-64 Linux after seemingly waiting on a timeout:

thread 'sys::test_select::test_select' panicked at 'assertion failed: `(left == right)` (left: `1`, right: `0`)', test/sys/test_select.rs:52

AFAIK, there's no guarantee that file descriptors are allocated monotonically, possibly the r2 + 1 is at fault and should be replaced by cmp::max(r1, r2) + 1?

@Susurrus
Copy link
Contributor

Yeah, that seems right to me. Additionally that value should not even be provided by the user to our version of select. We should provide an API that calculates that value for you so the user doesn't make a mistake like this.

@asomers
Copy link
Member Author

asomers commented Jul 21, 2017

Actually, there is a guarantee that file descriptors be allocated monotonically: http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html#tag_15_14 . The problem is that this one particular test isn't thread safe: it can fail like this:

  1. Some other thread opens a file and gets file descriptor 3
  2. test_select calls the first pipe and gets file descriptors 4 and 5
  3. the first thread closes file descriptor 3
  4. test_select calls the second pipe and gets file descriptors 3 and 6
  5. select times out, because we calculated nfds wrong. The simplest fix would just be to pass max(r1, r2) + 1 to select.

bors bot added a commit that referenced this issue Aug 11, 2017
701: Calculate `nfds` parameter for `select` r=asomers

Doing this behind the scenes makes the API less error-prone and easier
to use. It should also fix my issue in #679 (comment)
@asomers
Copy link
Member Author

asomers commented Nov 5, 2017

I haven't seen any intermittent failures for a few months now, so I'm going to close the issue. If any new intermittent failures pop up in the future, open a new issue.

@asomers asomers closed this as completed Nov 5, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants