Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Amend RFC 517: Revisions to reader/writer, core::io and std::io #576

Merged
merged 1 commit into from
Feb 3, 2015

Conversation

aturon
Copy link
Member

@aturon aturon commented Jan 13, 2015

The IO reform RFC is being split into several semi-independent pieces, posted as PRs like this one.

This RFC amendment adds the sections on Reader/Writer revisions, as well as core::io and std::io (which are closely related).

Rendered

type NonatomicResult<S, T, Err> = Result<S, PartialResult<T, Err>>;

// Ergonomically throw out the partial result
impl<T, Err> FromError<PartialResult<T, Err> for Err { ... }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a >

```rust
trait Writer {
type Err;
fn write(&mut self, buf: &[u8]) -> Result<uint, Err>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To help avoid tons of subtle breakage when this change is made, can we rename the current write method to write_all at some point before the complete overhaul happens? It'll force everyone to switch their code to the thing that will be semantically the same before this lands.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea!

@tailhook
Copy link

I believe I should repeat my question about EINTR here. Is there any agreement on how EINTR is handled by write_all and read_to_end and friends?

The problem is that if EINTR is returned as error by those methods, it means that for example running a strace (along with many other harmless cases for using signals) against process doing write_all would return a NonatomicResult. Which means that to work smoothly write_all must be wrapped in the loop anaway, which defeats the whole idea of write_all (comparing to plain write).

However if EINTR is consumed by internal loop of write_all, sometimes we will get processes which stuck on Ctrl+C forever (or even SIGTERM or any other signal).

`ReaderExt`). These iterators will be changed to yield
`NonatomicResult` values.

The `BufferedReader`, `BufferedWriter` and `BufferedStream` types stay
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The names for the trait and the type still conflict.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I still like

  • Reader -> Read
  • Writer -> Write
  • Buffer -> ReadBuffer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

2015年1月13日 下午1:03于 "Steven Fackler"

FWIW, I still like

Reader -> Read
Writer -> Write
Buffer -> ReadBuffer


Reply to this email directly or view it on GitHub.

@nodakai
Copy link

nodakai commented Jan 13, 2015

@tailhook I think APIs returning NonAtomicResult such as write_all() should consume EINTR inside. If some people finds it unacceptable, then bare write() is there for them. As for your concern about those APIs becoming irresponsive, decision on how to handle EINTR doesn't make substantial difference because all the blocking I/O API have the risk of helplessly getting stuck with the so-called "D" state (known as TASK_UNINTERRUPTIBLE for Linux;) see, for example,

`Writer` (and various adapters built on top of them) from moving to
`libcore` -- `IoError` currently requires the `String` type.

With associated types, there is essentially no downside in making
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm. It seems like if we move this into core, it couldn't be bounded by std::error::Error, right? This could be a problem when interoperating with other pieces of infrastructure that are explicitly bound by Error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any additional bounds on the error type you want to apply can be done when you're bounding over a Reader or Writer.

@alexcrichton
Copy link
Member

@tailhook

I believe I should repeat my question about EINTR here. Is there any agreement on how EINTR is handled by write_all and read_to_end and friends?

I'd first recommend taking a look at this article about EINTR (just did so myself for a refresher!), it's got some great information about EINTR and what it can be used for. The gist of the message is that EINTR cannot be used to reliably catch a "please interrupt" signal to the program in the case of syscalls like write and read (due to the last race that was mentioned). Another very important note about the article is that EINTR does not exist on Windows.

Due to our commitment to cross platform compatibility as well as the objects being somewhat high-level primitives, I think we will continue to swallow EINTR and the write level for the upper primitives.

As @nodakai mentions, however, it may be useful to handle this case (such as if you're manually using pselect, for example). For these cases I think that we may expose EINTR from the lower-level primitives rather than the higher level primitives.

For example, std::net::TcpSocket would always swallow EINTR, but if you used the AsRawFd trait to get out a &unix::FileDesc (a type not spec'd/planned currently), then write would literally be a call to the write syscall, returning all errors in their full glory.

@tailhook
Copy link

@nodakai, the issue with "D" state is both rare enough and impossible to do something about so we should not care.

In many other cases Ctrl+C works. Not being able to use is very very annoying. There are precedents of bad SIGINT handling in history, for example SVN versions 1.1-1.4 (IIRC) were not handling the signal fast (could linger for minutes) and that was deemed as a bug. And you know, you usually can press Ctrl+/ or just kill process externally, but that does not execute destructors. And I don't wan't another bag of utilities that do same.

@alexcrichton, while I can live with write_all and read_to_end that swallow EINTR (which should be clearly stated in documentation). I believe that not having it in write function is mistake. It means that every small command-line utility that does networking (e.g. a curl-like thing) or reads/writes many files (e.g. like a cp) is not interruptable. And every library that's written in recommended and straightforward way (i.e. not using any kind of asynchronous I/O loop), is not designed to handle interrupts.

Sure, complex programs like network servers which will use low level level API, unblocking calls, asynchronous loop like "mio", and will use signalfd (or polyfill) anyway, so will not suffer this problem.

But then what the high-level cross-platform interface is designed for? Is it for command-line tools that either hang or die without a cleanup? Is this only because they can't do anything better on Windows? Then I guess it's better to turn EINTR to error or even panic on it, in which case if the error is unhandled, at least it will execute all the destructors.

I think we may return some OperationInterrupted error, but not always when EINTR happens, only when signal handler has specified to do so. The error just expected to propagate up to the main() and exit with some non-zero exit code. The fact that OperationInterrupted is never returned on Windows doesn't make API less cross-platform. It's just random thought, I'm not sure if this is a good idea.

@alexcrichton
Copy link
Member

It means that every small command-line utility that does networking (e.g. a curl-like thing) or reads/writes many files (e.g. like a cp) is not interruptable.

As mentioned in the article that I linked to, using EINTR to handle an interrupt for your program in some form is racy (and generally not correct) if all you're using is the write syscall (which is what the write function would correspond to).

I definitely agree that we'll want a nice way to interrupt small programs performing various operations, but I don't think that EINTR is the right way to do so. Doing this, however, would require deeper modifications to the I/O interfaces to enable something along the lines of:

  • Running arbitrary code when a signal is received. This would likely be on a separate thread which would then signal that an I/O operation elsewhere should be canceled.
  • Returning control over to the main program. This likely meant that you weren't using write in the first place!

These are both somewhat involved, which is definitely where some high-level abstractions would help out! These may not exist in the standard library immediately, but I'm sure they'll start popping up in Cargo soon though.

To reiterate, handling an interruption via EINTR in write is racy, and as a result it shouldn't be the impetus for returning EINTR from functions at the high level.

I think we may return some OperationInterrupted error, but not always when EINTR happens, only when signal handler has specified to do so.

A crucial part of the stabilization of IoError will be ensuring its future extensibility, so although we may not provide this error quite yet, we could probably do so in the future!

@tailhook
Copy link

@alexcrichton, I agree with most of what you say, but:

I definitely agree that we'll want a nice way to interrupt small programs.
[ .. snip .. ]
These are both somewhat involved, which is definitely where some
high-level abstractions would help out! These may not exist in the
standard library immediately, but I'm sure they'll start popping up in
Cargo soon though.

Sounds like "never use abstractions in standard library". So why they are there in the first place?

Well, in fact the more I think about it the more I like idea of panic!() on SIGINT/SIGTERM. Of course it should be opt-out (maybe just sigmask is enough). The panic!() may just interrupt program in any place (not only I/O) and execute destructors. Is this at all possible?

@Binero
Copy link

Binero commented Jan 13, 2015

I feel like this probably belongs here; I feel it would be nice to have a "SearchableReader" trait which is like a Reader, but you can change the position inside of it.


```rust
trait Reader {
type Err; // new associated error type

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really worth saving 2 keystrokes here? Please just name it Error, we already have far too many unnecessary abbreviations in the Rust std library and newcomers are rightly complaining about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would shadow anything named Error in the surrounding scope for impls, which could make it a little awkward when there are types named Error due to RFC 356. That said, we used Result for RFC 439, so we probably should spell out Error and just live with the need for type aliases in some cases.

@quantheory
Copy link
Contributor

@tailhook

Well, in fact the more I think about it the more I like idea of panic!() on SIGINT/SIGTERM. Of course it should be opt-out (maybe just sigmask is enough). The panic!() may just interrupt program in any place (not only I/O) and execute destructors. Is this at all possible?

I doubt it. Unlike a panic! written in by a user, a signal can arrive at any time, and there is no guarantee that at any given moment the stack will be in a valid state to be unwound in the way that panic! typically does.

Ericson2314 pushed a commit to QuiltOS/rust that referenced this pull request Feb 1, 2015
This commit is an implementation of [RFC 576][rfc] which focuses on some of the
more core aspects of `std::io`. It re-introduces a new `std::io` module with the
base set of traits, helpers, and adaptors. The module is not currently ready for
prime time as it is not integrated into the rest of the system, but some
proof-of-concept implementations and methods are now available.

[rfc]: rust-lang/rfcs#576

This module will grow quickly over the next few weeks as more I/O RFCs land and
as the `old_io` module is slowly deprecated and phased out. At this time it is
not recommended for use, but it is helpful to land in-tree to start developing
more implementations.
@alexcrichton
Copy link
Member

Some concerns from @mahkoh on the implementation:


rust-lang/rust#21835 (comment)

The flush method does not indicate how many bytes were actually flushed, which raises a question about retries and atomicity with BufWriter, for example. The internal buffer may not be entirely written on a flush(), and it hasn't been spelled out, but what should happen? I would think we could do:

  1. Return an error if the single call to write did not write the entire buffer.
  2. Use .write_all to write the entire buffer. Note that this will ignore Interrupted and friends.

Just a point to clarify.


rust-lang/rust#21835 (comment)

Should flush return Ok by default? We may want to make an explicit decision about the default here (there hasn't been much discussion on the flushing topic)


rust-lang/rust#21835 (comment)

A suggestion that write_all should return Result<u64> instead of Result<()>


rust-lang/rust#21835 (comment)

A suggestion to use i64 in SeekFrom::Start instead of u64. Note that this would also raise a question as to why it takes a payload i64 instead of a second argument.


@mahkoh if I misrepresented anything, please let me know! I'll try to copy over future worries to this RFC as well.

@Tobba
Copy link

Tobba commented Feb 1, 2015

It seems the issues with the associated error types mentioned by @alexcrichton are quite easily solved and don't load to any major ergonomic issues.
(I went the wrong way about doing this initially, but it really isn't that bad)

  • A FromError<EndOfFile> bound on the Err type of write_all is definitely the way to go, FromError<Void> can be blanket impl'd over all types as Void is uninstantiable, which sorts out the Void related problems.
  • For the latter two points: Write, Read and Seek returning different error types is a good thing for both flexibility and sanity; and so is awareness of that. Writing code that can deal with this does involve some usage of map_err, but it doesn't lead to any major ergonomic issues.

See this branch: https://github.com/Tobba/rust/tree/core-io which ports librbml to an (incomplete) implementation of a proposal for the I/O design.

@alexcrichton
Copy link
Member

@Tobba

See this branch: https://github.com/Tobba/rust/tree/core-io which ports librbml to an (incomplete) implementation of a proposal for the I/O design.

One of the key points I personally found was that porting librustc to use librbml was incredibly painful, I never actually got it to compile!

The rbml implementation does not necessarily leverage try! to its fullest extent. I noticed that everything which used to be try!(foo) is now try!(foo.map_err(|err| SomeError(err))) which definitely seems like an ergonomic regression to me personally.

I would encourage you to write some tests for the various primitives as well because it ended up leading to tests looking like this versus looking like this

@Ericson2314
Copy link
Contributor

@alexcrichton

  • Regarding the map_err, if one does not wish to disambiguate errors, they can enforce that the various errors are the same, or FromError to a common "super type". Then that map_err won't be necessary.
  • In libcoretest, making the method call .tee<EndOfFile> should be all that is change wrt the nice version.

I would demonstrate these things myself, but I cannot build the standard library ATM.

@aturon
Copy link
Member Author

aturon commented Feb 2, 2015

Some replies to @mahkoh's concerns:


rust-lang/rust#21835 (comment)

The flush method does not indicate how many bytes were actually flushed, which raises a question about retries and atomicity with BufWriter, for example. The internal buffer may not be entirely written on a flush(), and it hasn't been spelled out, but what should happen? I would think we could do:

  1. Return an error if the single call to write did not write the entire buffer.
  2. Use .write_all to write the entire buffer. Note that this will ignore Interrupted and friends.

I think we should have flush on a buffered object call write in a loop, propagating the first error it sees. It can keep track of the unwritten data, so that no data is lost and it's possible to separately query -- or even get access to -- the remaining data.


rust-lang/rust#21835 (comment)

Should flush return Ok by default? We may want to make an explicit decision about the default here (there hasn't been much discussion on the flushing topic)

I think @mahkoh is raising a good point here, and flush should be a required method (no default impl) to help ensure it has a sensible implementation.


rust-lang/rust#21835 (comment)

A suggestion that write_all should return Result<u64> instead of Result<()>

The proposal in the comment, in particular, says that write_all should report Ok on "EOF" for the underlying writer, but no motivation was given. Keep in mind that the point of write_all is just as a convenience that avoids any looping and gives up on the first non-EINTR error. Yielding Ok on EOF would require extra complexity on the client side, and it's not obvious why you'd want it for this convenience method.


rust-lang/rust#21835 (comment)

A suggestion to use i64 in SeekFrom::Start instead of u64. Note that this would also raise a question as to why it takes a payload i64 instead of a second argument.

This connects to broader API guidelines, of course, but in general in Rust APIs we use types to convey semantic content/limitations when possible.

Updates from implementation and feedback:
@kjpgit
Copy link

kjpgit commented Feb 3, 2015

re: EINTR handling

I think at a minimum you should LOUDLY document any function that can
return EINTR, considering how many people see that is as a bug not a
feature, and will just use the shorter method name (write is shorter than
write_all).

Also, I think it would be more conservative to make read/write retry EINTR
by default, and if people complain, there could be a backwards compatible
set_retry_interrupt(false) method added to certain objects they care about,
depending on their actual use case. If instead you make no retry the
default, and that turns out to be a usability mistake, you can't change
that default globally without potentially breaking somebody, and forcing
devs to set an opt-out flag on all io objects in their program, and their
libraries, and those libraries' libraries, etc. will be impossible, just
like the FD_CLOEXEC debacle in posix (that should have been opt in not opt
out)

On Fri, Jan 30, 2015 at 1:25 PM, Aaron Turon notifications@github.com
wrote:

I want to try to summarize the current state here, to help move things
forward.

It seems like there's reasonable consensus around most of the basic
structure here, in terms of slimming down the traits, sticking closer to
single syscall semantics, EOF handling, and so on. I think the few remaning
items should be quickly resolved or pulled out to separate discussions so
we can make progress.

The items still being debated are:

EINTR handling

It's not completely clear when this should just cause a retry. However,
this RFC is mainly covering the core traits for reading and writing, and
not specific APIs. I think our best way forward here is to have the core
read/write operations map as directly to the underlying system APIs as
possible, producing an error on EINTR, so that they can serve as a lossless
building block for other abstractions.

Conveniences like write_all or read_to_end, on the other hand, should
retry on EINTR. This is partly under the assumption that in practice, EINTR
will most often arise when interfacing with other code that changes a
signal handler. Due to the global nature of these interactions, such a
change can suddenly cause your own code to get an error irrelevant to it,
and the code should probably just retry in those cases. In the case where
you are using EINTR explicitly, read and write will be available to
handle it.

read_until's behavior

Does it include the delimiter in the strings it produces (as today's API
does)? I think the right approach is for read_line/read_until to retain
the delimiter, while lines consumes it. This keeps consistency with the
split iterator, and also existing precedent (and useful coding patterns)
for read_line; it seems like the pragmatic solution.

close

The discussion about close is probably more relevant on RFCs for specific
IO APIs.

That said, I believe that close is backwards compatible to add later, and
am personally not yet convinced that it's needed given the existence of
flush. There's now a separate RFC
#770 on this topic, and I think
the discussion should move there (but shouldn't block this orthogonal RFC).

stdin/stdout/stderr:

I think this part of the RFC is underdeveloped, and I'd like to have a
separate discussion about it. I propose to remove it from this RFC for now.

Taking &mut Vec

I made a suggestion
#576 (comment) about
an API change that hasn't yet been incorporated into the RFC. I'll plan to
update the RFC with it, but please give feedback on this idea if you have
any.

Various names:

There are a few open bikesheds here that will need to be decided, but
nothing too major I think.


Reply to this email directly or view it on GitHub
#576 (comment).

@alexcrichton
Copy link
Member

At this time it looks like there's broad enough support for this amendment that I'm going to merge it.

There are still some concerns about using a concrete io::Error instead of an associated type, but there are avenues in the future to use a default associated type if necessary and we also have a window of time (albeit small) in the near future to tweak this aspect of the design.

There are also some lingering concerns about EINTR and how it is handled via various primitives. We're taking a relatively conservative stance by propagating the "error" on calls to read and write but auto-retrying on calls such as fsync (for files). We also have some leeway to tweak this in the near future, and it should in theory be a largely backwards compatible extension to start handling EINTR if deemed necessary (as @kjpgit pointed out with Python's pep).

There have also been a number of proposals for backwards-compatible extensions. Due to this compatibility this RFC is going to go ahead and land ahead of them and future RFCs can be used to add new traits and/or methods where the design can be hashed out.

I'd like to again emphasize that none of this functionality will land initially as #[stable] and we still have some time to tweak the design in various places. The overall direction this is headed, however, definitely seems to have broad support.

I'd like to thank everyone again who participated in this thread (and #517!), all the comments have been quite helpful and this is all definitely moving in a great direction. Thank you!

@alexcrichton alexcrichton merged commit f702d55 into rust-lang:master Feb 3, 2015
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 3, 2015
This commit is an implementation of [RFC 576][rfc] which adds back the `std::io`
module to the standard library. No functionality in `std::old_io` has been
deprecated just yet, and the new `std::io` module is behind the same `io`
feature gate.

[rfc]: rust-lang/rfcs#576

A good bit of functionality was copied over from `std::old_io`, but many tweaks
were required for the new method signatures. Behavior such as precisely when
buffered objects call to the underlying object may have been tweaked slightly in
the transition. All implementations were audited to use composition wherever
possible. For example the custom `pos` and `cap` cursors in `BufReader` were
removed in favor of just using `Cursor<Vec<u8>>`.

A few liberties were taken during this implementation which were not explicitly
spelled out in the RFC:

* The old `LineBufferedWriter` is now named `LineWriter`
* The internal representation of `Error` now favors OS error codes (a
  0-allocation path) and contains a `Box` for extra semantic data.
* The io prelude currently reexports `Seek` as `NewSeek` to prevent conflicts
  with the real prelude reexport of `old_io::Seek`
* The `chars` method was moved from `BufReadExt` to `ReadExt`.
* The `chars` iterator returns a custom error with a variant that explains that
  the data was not valid UTF-8.
alexcrichton added a commit to alexcrichton/rust that referenced this pull request Feb 4, 2015
This commit is an implementation of [RFC 576][rfc] which adds back the `std::io`
module to the standard library. No functionality in `std::old_io` has been
deprecated just yet, and the new `std::io` module is behind the same `io`
feature gate.

[rfc]: rust-lang/rfcs#576

A good bit of functionality was copied over from `std::old_io`, but many tweaks
were required for the new method signatures. Behavior such as precisely when
buffered objects call to the underlying object may have been tweaked slightly in
the transition. All implementations were audited to use composition wherever
possible. For example the custom `pos` and `cap` cursors in `BufReader` were
removed in favor of just using `Cursor<Vec<u8>>`.

A few liberties were taken during this implementation which were not explicitly
spelled out in the RFC:

* The old `LineBufferedWriter` is now named `LineWriter`
* The internal representation of `Error` now favors OS error codes (a
  0-allocation path) and contains a `Box` for extra semantic data.
* The io prelude currently reexports `Seek` as `NewSeek` to prevent conflicts
  with the real prelude reexport of `old_io::Seek`
* The `chars` method was moved from `BufReadExt` to `ReadExt`.
* The `chars` iterator returns a custom error with a variant that explains that
  the data was not valid UTF-8.
@aturon aturon mentioned this pull request Mar 23, 2015
@Centril Centril added the A-input-output Proposals relating to std{in, out, err}. label Nov 23, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-input-output Proposals relating to std{in, out, err}.
Projects
None yet
Development

Successfully merging this pull request may close these issues.