Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AIOs that fail to enqueue should not be marked as in_progress #714

Closed
chrisvest opened this issue Jul 30, 2017 · 10 comments
Closed

AIOs that fail to enqueue should not be marked as in_progress #714

chrisvest opened this issue Jul 30, 2017 · 10 comments
Assignees
Labels

Comments

@chrisvest
Copy link

When enqueuing an AIO, such as read, write, and fsync, the AioCb is marked as in_progress before the system call is made.

However, if the system call returns an error, such as EAGAIN or ENOSYS, etc. then the AIO is not actually in progress, and the subsequent drop will panic in its attempt at doing an orderly termination of the AIO:

WARNING: dropped an in-progress AioCbthread 'writes_must_be_observable_after_completion' panicked at 'Inconsistent AioCb.in_progress value', /Users/chris/.cargo/registry/src/git.luolix.top-1ecc6299db9ec823/nix-0.8.1/src/sys/aio.rs:305
stack backtrace:
   0:        0x10fb43ca3 - std::sys::imp::backtrace::tracing::imp::unwind_backtrace::h884a721e113c3303
   1:        0x10fb45b33 - std::panicking::default_hook::{{closure}}::h7a7d734b2824d103
   2:        0x10fb4577c - std::panicking::default_hook::h3eb11bd6cbfdc331
   3:        0x10fb47ca7 - std::panicking::rust_panic_with_hook::h8b9b25777425677b
   4:        0x10fb01125 - std::panicking::begin_panic::he221b30de772fec8
   5:        0x10fb03bf5 - <nix::sys::aio::AioCb<'a> as core::ops::Drop>::drop::hfbac5799b913ac7f
   6:        0x10fafee34 - core::ptr::drop_in_place::hd03fe87c6acd00d7
   7:        0x10fafedd7 - core::ptr::drop_in_place::hc23ed067f74bc487

Also note that the WARNING printed above is missing a line-break, and thus messes up the output.

The above was produced on MacOS Sierra, but I think it's the same on any relevant platform.

@asomers asomers self-assigned this Jul 30, 2017
@asomers asomers added the A-bug label Jul 30, 2017
@asomers
Copy link
Member

asomers commented Jul 30, 2017

Good find. I'll fix it.
What do you think about those "dropped an in-progress AioCb" warnings anyway? I've been thinking about turning them into a panic, because stderr won't always be available or appropriate.

@chrisvest
Copy link
Author

In that case, do the panic from within a function that has a name that explains why the panic happens, like the warning currently does. That way, the reason for the panic will be plainly visible from the backtrace.

asomers added a commit to asomers/nix that referenced this issue Jul 31, 2017
…ite`

Previously, the `AioCb`'s `in_progress` field would erroneously be set
to `true`, even if the syscall had an error

Fixes nix-rust#714
asomers added a commit to asomers/nix that referenced this issue Jul 31, 2017
…ite`

Previously, the `AioCb`'s `in_progress` field would erroneously be set
to `true`, even if the syscall had an error

Fixes nix-rust#714
asomers added a commit to asomers/nix that referenced this issue Aug 2, 2017
…ite`

Previously, the `AioCb`'s `in_progress` field would erroneously be set
to `true`, even if the syscall had an error

Fixes nix-rust#714
@asomers
Copy link
Member

asomers commented Aug 2, 2017

@chrisvest I have a solution coded up, but it's failing on OSX in Travis. I don't have an OSX machine, so it's pretty hard for me to debug. Since you do, would you mind checking out the issue_714 branch and running the tests? The error in Travis is that test_fsync times out because aio_fsync continually returns EAGAIN.

@chrisvest
Copy link
Author

@asomers I ran a cargo test of your #715 PR locally, and it didn't hang.

test sys::test_aio::test_cancel ... ok
test sys::test_aio::test_aio_suspend ... ok
test sys::test_aio::test_drop ... ok
…
test sys::test_aio::test_aio_cancel_all ... ok
…
test sys::test_aio::test_read_immutable_buffer ... ok
…
test sys::test_aio::test_read ... ok
test sys::test_aio::test_fsync ... ok
test sys::test_aio::test_write ... ok
test sys::test_aio::test_write_sigev_signal ... ok
…
test sys::test_aio::test_read_into_mut_slice ... ok
$ uname -a
Darwin shipilev.lan 16.7.0 Darwin Kernel Version 16.7.0: Thu Jun 15 17:36:27 PDT 2017; root:xnu-3789.70.16~2/RELEASE_X86_64 x86_64

Maybe travis is using an older MacOS version that behaves differently.

In another code base I do however have an EAGAIN loop on aio_read, so I'll be examining how the nix test suite manages to avoid that.

@chrisvest
Copy link
Author

By the way, did you tell Musl about the behavioural difference observed?

@chrisvest
Copy link
Author

@asomers I found out that my EAGAIN loop from aio_read was because the file had not been opened for reading. I would have expected to see EBADF in that situation, but apparently not.

I don't see any connection between this and aio_fsync, though. I notice that my system doesn't have a man-page for aio_fsync, but the header file does define it.

asomers added a commit to asomers/nix that referenced this issue Sep 2, 2017
…ite`

Previously, the `AioCb`'s `in_progress` field would erroneously be set
to `true`, even if the syscall had an error

Fixes nix-rust#714
@asomers
Copy link
Member

asomers commented Sep 3, 2017

Ok, I've finally got it. The problem with OSX is that test_drop, which panics, somehow screws up OSX's AIO system, and causes subsequent tests to fail. I fixed it by moving that test into its own process. @chrisvest does this all look good to you?

@chrisvest
Copy link
Author

@asomers Yes, this looks good!

asomers added a commit to asomers/nix that referenced this issue Sep 3, 2017
Previously, the `AioCb`'s `in_progress` field would erroneously be set
to `true`, even if the syscall had an error

Fixes nix-rust#714
asomers added a commit to asomers/nix that referenced this issue Sep 3, 2017
Previously, the `AioCb`'s `in_progress` field would erroneously be set
to `true`, even if the syscall had an error

Fixes nix-rust#714
bors bot added a commit that referenced this issue Sep 3, 2017
715: Fix multiple issues with POSIX AIO r=asomers a=asomers

Fixed error handling in `AioCb::fsync`, `AioCb::read`, and `AioCb::write`
    
    Previously, the `AioCb`'s `in_progress` field would erroneously be set
    to `true`, even if the syscall had an error
    
    Fixes #714

AioCb::Drop will now panic for in-progress AioCb

    Printing a warning message to stderr isn't really appropriate, because
    there's no way to guarantee that stderr is even valid.  Nor is
    aio_suspend necessarily an appropriate action to take.
@Susurrus
Copy link
Contributor

Susurrus commented Sep 5, 2017

@asomers @chrisvest Is this bug still unresolved even after the merging of #715?

@chrisvest
Copy link
Author

Wasn't sure about the process (close on merge, on release; closed by opener, by fixer), but yes my problem is solved with a git-dependency on nix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants