Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

std::io: retry write operation on ErrorKind::WouldBlock #100594

Closed
wants to merge 1 commit into from

Conversation

uarif1
Copy link
Contributor

@uarif1 uarif1 commented Aug 15, 2022

If writing to tty when O_NONBLOCK bit is set and the lock to
tty is held (e.g. multiple threads writing to console), the Linux kernel
returns -EAGAIN.
(https://elixir.bootlin.com/linux/v5.19/source/drivers/tty/tty_io.c#L952)

Not accounting for -EAGAIN (i.e. ErrorKind::WouldBlock), results in
a panic. Its better to retry the write as the other thread could have
release the lock by then.

Signed-off-by: Usama Arif usama.arif@bytedance.com

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Aug 15, 2022
@rustbot
Copy link
Collaborator

rustbot commented Aug 15, 2022

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

  • Stabilizing library features
  • Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
  • Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
  • Changing public documentation in ways that create new stability guarantees
  • Changing observable runtime behavior of library APIs

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Mark-Simulacrum (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 15, 2022
If writing to tty when O_NONBLOCK bit is set and the lock to
tty is held (e.g. multiple threads writing to console), the Linux kernel
returns -EAGAIN.
(https://elixir.bootlin.com/linux/v5.19/source/drivers/tty/tty_io.c#L952)

Not accounting for -EAGAIN (i.e. ErrorKind::WouldBlock), results in
a panic. Its better to retry the write as the other thread could have
release the lock by then.

Signed-off-by: Usama Arif <usama.arif@bytedance.com>
Reviewed-by: Fam Zheng <fam.zheng@bytedance.com>
@uarif1 uarif1 force-pushed the retry_write_WouldBlock branch from 601ca6d to eb44672 Compare August 15, 2022 18:01
@uarif1
Copy link
Contributor Author

uarif1 commented Aug 15, 2022

Retry on write for EAGAIN is done in other repositories as well, like https://github.com/qemu/qemu/blob/d102b8162a1e5fe8288d4d5c01801ce6536ac2d1/chardev/char.c#L122

@the8472
Copy link
Member

the8472 commented Aug 15, 2022

This is a bad policy in generic code that can talk to any kind of file descriptor because the file (which may also be a socket or pipe) may have been opened with O_NONBLOCK which would then result in a busy loop burning CPU cycles. The correct solution for non-blocking file descriptors is to use polling (e.g. with mio's Poll) or to switch them back into blocking mode. std::io only supports blocking IO.

The qemu code seems to be about special files so it can probably make additional assumptions that generic code can't.

@uarif1
Copy link
Contributor Author

uarif1 commented Aug 16, 2022

Hi,

Thanks for the review.

Agreed, it makes sense not to make this change in generic code as it will be used by other files as well and not just stdout/stderr.

I tried a simple test code in C++ and rust, to just change the stdout to O_NONBLOCK and then create 10 threads to print to stdout, C++ is able to print and finish to completion, but rust panics and crashes, so I think glibc probably has similar support for O_NONBLOCK for stdout.

Just for context, in terms of a real life usecase, qemu with nested virtualization sets stdout to NONBLOCK and if there is another thread with rust running in background that prints it causes the thread to crash if the lock to tty was already held (C++ program wont crash).

The commit in this PR is modifying generic code so it isnt correct, but maybe there should be something done specifically for stdout/stderr?

@the8472
Copy link
Member

the8472 commented Aug 16, 2022

I think the solution would be to set up polling as fallback in the platformspecific file-descriptor wrapping types when a wouldblock is encountered. But I think it requires a bit of discussion whether we want to support this at all.

Can you open an issue describing your problem?

@uarif1
Copy link
Contributor Author

uarif1 commented Aug 17, 2022

Thanks, I have opened the issue #100673

@famz
Copy link

famz commented Aug 17, 2022

This is a bad policy in generic code that can talk to any kind of file descriptor because the file (which may also be a socket or pipe) may have been opened with O_NONBLOCK which would then result in a busy loop burning CPU cycles. The correct solution for non-blocking file descriptors is to use polling (e.g. with mio's Poll) or to switch them back into blocking mode. std::io only supports blocking IO.

The fact that println! can be working with any kind of fd, including O_NONBLOCK stands too. It may be passed from parent process with a O_NONBLOCK mode, or, in @uarif1 's case, a child process inherited our stdout fd and set it to O_NONBLOCK any moment later. So fixing the mode doesn't solve the problem, as it's out of control; or it could be that O_NONBLOCK is intended by the developer.

For example if the main project is designed to set and use stdout in NON_BLOCKING for reasons, any linked library crate that calls println! could crash, which IMO is not ideal situation, and is quite hard to fix.

To avoid unnecessary busy looping, maybe we can embed a small ppoll() in write_all() if O_NONBLOCK is set (can be tested with fcntl())? That way this function can block until the fd is ready.

@Mark-Simulacrum
Copy link
Member

I'm going to go ahead and close this PR per #100673 (comment) - we're happy to add some documentation about not supporting this, but that's probably best in a separate PR.

I don't think it makes sense for std to have ppoll or similar in write_all -- we're still papering over a fundamental difference, and having nested event loops (e.g., if you're setting up nonblocking I/O so that you can use stdin/stdout in a context with a more general event loop on poll) is also a bad idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants