Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review of IO blocking behaviour #24526

Open
2 of 14 tasks
samoconnor opened this issue Nov 8, 2017 · 23 comments
Open
2 of 14 tasks

Review of IO blocking behaviour #24526

samoconnor opened this issue Nov 8, 2017 · 23 comments
Labels
io Involving the I/O subsystem: libuv, read, write, etc.

Comments

@samoconnor
Copy link
Contributor

samoconnor commented Nov 8, 2017

I've recently been helping @quinnj to debug his new HTTP.jl package, in particular a new FIFOBuffer type that is intended to behave like the IO buffers in Base (#87, #86, #76, #75, #74). The process of trying to be consistent with Base has highlighted a number of IO behaviour inconsistencies. I've hacked up a script that runs through a sequence of IO operations for various types and produces a MD table of the results (see below).

The two main issues are with the blocking behaviour of read() and eof().

The spec for eof() says: "If the stream is not yet exhausted, this function will block to wait for more data if necessary, and then return false."

  • eof() works per spec for BufferStream and TCPSocket.
  • For Filesystem.File, IOStream, and PipeBuffer, eof() does not block to wait for more data. Instead it returns true without blocking if there is not currently data available to be read.
  • For IOBuffer it seems that eof() just always returns true.

The spec for read(::IO, ::Int) says: "[By default] this function will block repeatedly trying to read all requested bytes, until an error or end-of-file occurs."

  • For BufferStream and TCPSocket, read() behaves as specified.
  • However, for the other types read() seems to just return however many bytes are available at the time and does not block.
  • For IOBuffer, read() always returns an empty array.

Other issues:

  • For BufferStream and IOStream, isreadable() returns true after close() is called.
  • For BufferStream, iswriteable() returns true after close().
  • For BufferStream and IOStream, read() after close() returns empty data, whereas the other types throw an error.
  • For BufferStream and TCPSocket, read(io, String) blocks until the stream is closed (this seems consistent with the blocking behaviour of read(io, nb) , however the other types return immediately with a string containing however many bytes are available at the time.
  • mark/reset don't work for BufferStream mark/reset broken for BufferStream #24465
  • There is no API for sending TCP FIN, e.g. shutdown(fd, SHUT_WR) or uv_shutdown(). If a TCP server waits for a request to be sent before responding, a Julia client would have to call close() to signal end of request, but would then be unable to read the response.
  • Perhaps the missing shutdown is related to the inconsistencies with isreadable() and iswriteable(). It seem like maybe there should be a closeread() and closewrite() that respectively cause isreadable() and iswriteable() to return false.
  • Calling close on BufferStream causes eof() = true and isopen() = false, but iswriteable() and isreadable() are both still true, and in fact reads and writes continue to work with the closed stream. It seems that BufferStream would benefit from a seperate closewrite() for signalling eof() to the reader.
type IOBuffer PipeBuffer BufferStream File (IOStream) Filesystem
init io = IOBuffer() io = PipeBuffer() io = BufferStream() echo Hello > file echo Hello > file
write(io, "Hello") write(io, "Hello") write(io, "Hello") open("file") Filesystem. open("file")
isreadable() true true true true MethodError
isopen() true true true true true
eof() true❗️ false false false false
position 5 0 MethodError 0 0
read(io, 5) ""❗️ "Hello" "Hello" "Hello" "Hello"
eof() true true❗️ blocked true true
"Again" write(io, "Again") write(io, "Again") write(io, "Again") echo Again >> file echo Again >> file
position(io) 10 0 MethodError 5 5
eof(io) true❗️ false false false false
read(io, 5) ""❗️ "Again" "Again" "Again" "Again"
eof(io) true true❗️ blocked true true
"Again" write(io, "Again") write(io, "Again") write(io, "Again") echo Again >> file echo Again >> file
eof(io) true❗️ false false false false
read(io, String) ""❗️ "Again" blocked "Again" "Again"
iswritable(io) true true true false MethodError
isreadable(io) true true true true MethodError
close(io); isopen(io) false false false false false
iswritable(io) false false true❗️ false MethodError
isreadable(io) false false true❗️ true❗️ MethodError
eof(io) true true true true Base.UVError
read(io, 5) Argument Error Argument Error ""❗️ ""❗️ Base.UVError

Note that TCPSocket behaves almost the same as BufferStream but without the isopen() and isreadable() issues:

type BufferStream TCPSocket
init io = BufferStream() srv = accept(); io = connect()
write(io, "Hello") write(srv, "Hello")
isreadable() true true
isopen() true true
eof() false false
position MethodError MethodError
read(io, 5) "Hello" "Hello"
eof(io) blocked blocked
close(io); isopen(io) true ❗️ false
eof(io) true true
isreadable(io) true❗️ false
read(io, 5) "" ""
@timholy
Copy link
Member

timholy commented Nov 8, 2017

I'm not an IO guru so have nothing concrete to add, but just have to say wow, really nice analysis.

@fredrikekre fredrikekre added the io Involving the I/O subsystem: libuv, read, write, etc. label Nov 8, 2017
@stevengj
Copy link
Member

stevengj commented Nov 8, 2017

What do you want eof() to do for File and IOBuffer? Block forever since there's no telling when additional bytes could be written to the file/buffer?

@samoconnor
Copy link
Contributor Author

Hi @stevengj,

I'm not sure what the right interface is, but I am sure that it needs to be well specified and unambiguous so that compatible implementations can be dropped in and "just work".

If read(io, nb) was made to block for simple files, callers who wanted to read just what is in the file now would have to do read(myfile, filesize(myfile)). To be consistent with the current doc, eof() for a simple local file should probably be false/block unless the file is unlinked and there are no other open write file handles on the file.

I understand that a very common use case for simple files is to want to read to the end of what is in the file now. However, the less common use case of reading form a file that is occasionally appended to by a 3rd party needs to be handled too.

Consider also that the IOStream returned by the simple open() function is not always a simple file. See the example below of opening a named pipe (mkfifo). In this case eof() = true seems wrong because a pipe can be expected to have more data at some future time. Likewise, the read(io, 6) call should have blocked. If I'm reading a fixed number of bytes from a named pipe it is probably because there is an agreed protocol where I'm expecting the other end to write that many bytes.

type File NamedPipe
init echo Hello > file mkfifo fifo; echo Hello > fifo
open("file") open("fifo")
typeof(io) IOStream IOStream
isopen() true true
eof() false false
read(io, 5) "Hello" "Hello"
eof(io) true true
"Again" echo Again >> file echo Again > fifo
eof(io) false false
read(io, 6) "Again" "Again"
eof(io) true true

Maybe the answer is to have a user selectable blocking/non-blocking mode (O_NONBLOCK).

Maybe it would be better to have clearly distinct types for simple-file-like-things and stream-like-things.

Or maybe it's best to enforce blocking everywhere and encourage the use of tasks and/or patterns like read(myfile, filesize(myfile)).

Another alternative would be to have a WouldBlockError exception that would be thrown by anything that would block when the "i don't want things to block" option has been set (this would allow writing code that acts as if nothing ever blocks, but not have it deadlock in odd ways when there is a problem).

@bicycle1885
Copy link
Member

For IOBuffer it seems that eof() just always returns true.

What does this mean? eof of IOBuffer may return false.

julia> buf = IOBuffer(b"foobar");

julia> eof(buf)
false

@samoconnor
Copy link
Contributor Author

Hi @bicycle1885,
I think the difference is that my test script uses the IOBuffer() constructor.
So, I guess my finding is that for the default IOBuffer() constructor, it seems that eof() always returns true for the series of function calls in my test scenario. e.g.

julia> io = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> eof(io)
true

julia> write(io, "Hello")
5

julia> eof(io)
true

julia> write(io, " World!")
7

julia> eof(io)
true

julia> read(io)
0-element Array{UInt8,1}

julia> String(take!(io))
"Hello World!"

@bicycle1885
Copy link
Member

write changes the offset of a buffer so it is quite natural that a sequence of write operations does not change the return value. However, seek operations (e.g. seekstart) will change the return value:

julia> buf = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> write(buf, b"foo")
3

julia> eof(buf)
true

julia> seekstart(buf)
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=3, maxsize=Inf, ptr=1, mark=-1)

julia> eof(buf)
false

@samoconnor
Copy link
Contributor Author

@bicycle1885 I accept that the behaviour of IOBuffer() in isolation could seem "quite natural" in a certain context. i.e. If you expect it to behave like a local disk file. However, as things stand the documentation about how each IO type is supposed to behave is not clear.

The point of posting this issue is to highlight behaviour inconsistencies between the various IO types. I realised that I'd kind of figured out how to use the IO types I was familiar with, but that I this was largely a matter of trial and error and copying usage patterns from other code. When it came to trying to debug the HTTP.jl package, and coming up against questions of "how should function this behave", it wasn't a simple matter of "read and follow the spec". The task seemed to come down to figuring out which of the many IO variants in Base was most like what we wanted and emulating it as best we could.

Maybe the solution is just to have clearer documentation, or as I said above "Maybe it would be better to have clearly distinct types for simple-file-like-things and stream-like-things.". Maybe there needs to be seperate documentation for eof(::AbstractFile) and eof(::AbstractStream).

@StefanKarpinski StefanKarpinski added the triage This should be discussed on a triage call label Nov 16, 2017
@StefanKarpinski
Copy link
Member

StefanKarpinski commented Nov 16, 2017

This is a great analysis and we do want to make sure all of this is sane and consistent, but it's a bit too much to bite off for the impending 1.0 release, so based on discussion on triage, we're going to say that I/O blocking behavior is not yet stable and may change in the 1.x series. We will have to be careful to make sure that any changes we make don't break key libraries (e.g. HTTP).

@StefanKarpinski StefanKarpinski removed the triage This should be discussed on a triage call label Nov 16, 2017
@StefanKarpinski StefanKarpinski added this to the 1.x milestone Nov 16, 2017
@quinnj
Copy link
Member

quinnj commented Nov 16, 2017

Luckily, HTTP.jl currently uses it's own streaming buffer type (HTTP.FIFOBuffer), so that should keep it arms-distance from changes in Base until things settle.

@samoconnor
Copy link
Contributor Author

Maybe we need something like @oxinabox's https://github.com/oxinabox/InterfaceTesting.jl for IO.

@samoconnor
Copy link
Contributor Author

#24242

@samoconnor
Copy link
Contributor Author

#10292

@ssfrr
Copy link
Contributor

ssfrr commented Feb 20, 2018

A bit ago I wrote up some notes trying to wrap my brain around the streaming interface in the context of trying to make my SampledSignals.jl stream semantics better match Base. Definitely not as nicely-organized as @samoconnor's work here, but might be useful.

Also relevant is JuliaIO/FileIO.jl#78, where I'd like to support opening up media files in a streaming way, where you end up with a stream of images or audio samples, rather than a stream of bytes.

[edit: premature submission and wrong issue link]

@EricForgy
Copy link

I suspect if we add WebSocket to this analysis, it probably would not fare too well 😅

@ssfrr
Copy link
Contributor

ssfrr commented Feb 20, 2018

The main difference between the way I designed the SampledSignals streams and Base is that I don't have an eof method, and streams will always block on a [read|write] if there's not enough [data|space] available. So reading to the end of a stream looks like:

const N = 4096 # amount to read each time
while true
    buf = read(stream, N)
    # do stuff with buf
    length(buf) == N || break
end

read! and write both return the number of elements [read|written], which you can check similarly. This works well in the context of a stream that's always flowing until it reaches its final end (like an audio device), but maybe not in the case where you have occasional bits and pieces of unknown size coming in and want to process them in a timely manner (though maybe in that case you just set N=1?)

@samoconnor
Copy link
Contributor Author

Another thing: readbytes! stands out as an odd name now that readstring has been replaced by read(io, String).
Why not replace readbytes!(io, b::AbstractVector{UInt8}, n) with
read!(io, b::AbstractVector{UInt8}, n)?

@samoconnor
Copy link
Contributor Author

samoconnor commented Oct 20, 2018

After the refinements proposed above, we would end up with this:

Allocation Termination Blocking Function
alloc n bytes read(io, n)
read(io, T)
alloc n bytes readavailable(io, n)
alloc eof read(io)
alloc eof readavailable(io)
copy n bytes read!(io, b, n)
read!(io, array)
unsafe_read(io, p, n)
copy n bytes readavailable!(io, b, n)
unsafe_readavailable(io, p, n)
copy eof read!(io, b, typemax(Int))
copy eof readavailable!(io, b, typemax(Int))

@samoconnor
Copy link
Contributor Author

it's a bit too much to bite off for the impending 1.0 release

@StefanKarpinski where do you see this fitting into the 1.x release plan?

This kind of bug comes up quite frequently: JuliaLang/MbedTLS.jl#186 where it kind of works but seems a bit glitchy or slow sometimes. The root cause is often that someone wrote some generic code that assumed a read would be non-blocking, but by the time the call makes its way through a few generic base/io.jl methods it ends up in an external library read method that is blocking (or that blocks in some situations).

Julia should be a great language for building easily compossible pieces of IO machinery; and for building data consumers and producers that work regardless of where data came from (or how it was buffered, encrypted, proxied, cached, etc). But, unless the Base.IO API has unambiguous and complete behaviour contracts, it will never work reliably.

@StefanKarpinski
Copy link
Member

I don't know, I'm not really the person to ask. @JeffBezanson, @vtjnash, @Keno, opinions on this?

@ssfrr
Copy link
Contributor

ssfrr commented Nov 3, 2018

Awesome work - I think this is a really nice and consistent architecture, and I've also been bitten by the current state of things. I'm doubtful that this would be a 1.x sort of thing though, right? Seems like some of these changes would be breaky enough that they'd need to wait until 2.0.

Questions

  1. Why is read!(io, array) listed under "alloc"? Shouldn't that be a "copy" function?
  2. In read!(io, b, n), do you need the n, or do you just pass in a View if you want to fill a subset of your array? (I guess it's useful for larger n)?
  3. In read!(io, b, n), what do you do if n > length(b)? Do we keep the current behavior of reallocating a larger b? I'm guessing yes because otherwise I don't think the typemax(Int) versions make sense, but I wanted to make sure. I'd be happy dropping this behavior, never reallocating, and also removing the n argument, but perhaps the auto-reallocation comes in handy for some folks.
  4. Should unsafe_read be unsafe_read!? (also unsafe_readavailable!)
  5. is there a way to avoid special-casing read!(io, b, typemax(Int))? If it's going to be a sentinel value maybe -1 is better because it's not a valid length. (again this becomes a non-issue if the auto-reallocation behavior is removed).

Comments

A fourth aspect is return type (e.g. read(io, T) converts the bytes read to type T and read!(io, array) fills in a fixed type array). However, this is largely handled by generic wrapper methods and should not influence the design of the fundamental API too much.

Another asymmetry in the allocating/non-allocating versions is when you want to get a Vector{T}, e.g. you can do read!(io, arr::Vector{Float64}) and it will fill the array, but there's no read equivalent. Perhaps there could also be a read(::IO, ::Type, ::Integer) method that would return a Vector? So in that case it could be defined read(io::IO, T::Type=UInt8, n::Integer).

In the typical use pattern (while !eof(io) process(readavailable(io)) ; end) it is the job of eof to be blocking. It seems odd for readavailable to block as well.

I agree it's weird for readavailable to block if there are 0 bytes available, but TBH it also seems weird for eof to block - checking whether a stream has ended seems orthogonal to waiting for data to be available. If anything I'd propose making the wait(io) function be the way to wait for data, and have eof be nonblocking.

@samoconnor
Copy link
Contributor Author

  1. Why is read!(io, array) listed under "alloc"? Shouldn't that be a "copy" function?

Yes, corrected above.

  1. In read!(io, b, n), do you need the n, or do you just pass in a View if you want to fill a subset of your array?

I think we should keep the n (and probably add copyto!(b, offset, io, n)).
If the compiler (and GC) becomes smart enough one day to ensure that View never causes allocation, then those could become undocumented internal functions and the doc could say "just use a View". But I think it makes sense to defined the IO primitives in terms of the simplest possible types.

  1. In read!(io, b, n), what do you do if n > length(b)? Do we keep the current behavior of reallocating a larger b?

To me having a read! method that reallocates its argument smells a bit wrong. I would prefer to seperate the auto-sizing by having a special AbstractArray type that auto-grows on access; or by just calling sizehint!(array, bytesavailable(io)) before the read.

I'm not sure that I understand what use case the auto-sizeing is supposed to help with.
Consider this scenario:

buf = Vector{UInt8}(undef, 4096)
while !eof(io)
    readbytes!(io, buf)
    render(my_display, buf)
end

I start out reading small chunks of data and achieving low output latency.
This works nicely until my code gets busy doing something else and a few MB end up in the OS input buffers...
Then my next read sees that there is lots of data available and ends up reallocating my buffer to several MB!
Worse, after that every subsequent readbytes! call is now blocking to wait for several MB (because the default nb is length(buf) and readbytes! is blocking).

  1. Should unsafe_read be unsafe_read!? (also unsafe_readavailable!)

The current unsafe_read has no bang. I have no strong opinion on this. The bang convention is not applied very consistently as far as I can see.

  1. ... this becomes a non-issue if the auto-reallocation behavior is removed

That would be my preference.

Perhaps there could also be a read(::IO, ::Type, ::Integer) method that would return a Vector? So in that case it could be defined read(io::IO, T::Type=UInt8, n::Integer).

That sounds pretty sane. I guess the readavailable version would read what is available modulo the size of T.

it also seems weird for eof to block - checking whether a stream has ended seems orthogonal to waiting for data to be available. If anything I'd propose making the wait(io) function be the way to wait for data, and have eof be nonblocking.

Yes. I agree. eof(io) should be just !isreadable(io) && bytesavailable(io) == 0.
... and this brings up the issue of separating open/closed state for read and write halves of the stream.
(closeread(io) ensures isreadable(io) == false, closewrite(io) ensures iswriteable(io) == false ?)

@norru
Copy link

norru commented Feb 11, 2020

Question, as of 2020-02-11 what is the recommended, geneeral way to read from a stream multiple times onto a preallocated array (possibly without extra allocations)? (eg, if read provided the start/end range)

buf =  Vector{UInt8}(undef, 30)
read!(buf, start=1, end=10)
read!(buf, start=11, end=20)
read!(buf, start=21, end=30)

Would slicing with view do the job in Julia? In C you would do this with pointer arithmetics,

ssize_t read(int fd, void *buf, size_t count);
size_t fread(void *ptr, size_t size, size_t nmemb, FILE *stream);

vtjnash added a commit that referenced this issue May 11, 2021
vtjnash added a commit that referenced this issue May 11, 2021
vtjnash added a commit that referenced this issue May 11, 2021
vtjnash added a commit that referenced this issue Jul 28, 2021
vtjnash added a commit that referenced this issue Jul 28, 2021
vtjnash added a commit that referenced this issue Jul 28, 2021
vtjnash added a commit that referenced this issue Jul 29, 2021
c42f added a commit that referenced this issue Aug 25, 2021
This name was suggested in #24526 and

* Has a good analogy to close(), so people are likely to be able to
  guess what it means.
* Is more specific to IO (conversely, it's easy to imagine shutdown()
  being wanted for any number of things unrelated to closing the write
  side of an IO stream).
c42f added a commit that referenced this issue Aug 25, 2021
This name was suggested in #24526 and

* Has a good analogy to close(), so people are likely to be able to
  guess what it means.
* Is more specific to IO (conversely, it's easy to imagine shutdown()
  being wanted for any number of things unrelated to closing the write
  side of an IO stream).
c42f added a commit that referenced this issue Aug 25, 2021
This name was suggested in #24526 and

* Has a good analogy to close(), so people are likely to be able to
  guess what it means.
* Is more specific to IO (conversely, it's easy to imagine shutdown()
  being wanted for any number of things unrelated to closing the write
  side of an IO stream).
vtjnash pushed a commit that referenced this issue Aug 26, 2021
This name was suggested in #24526 and

* Has a good analogy to close(), so people are likely to be able to
  guess what it means.
* Is more specific to IO (conversely, it's easy to imagine shutdown()
  being wanted for any number of things unrelated to closing the write
  side of an IO stream).
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Feb 22, 2022
This name was suggested in JuliaLang#24526 and

* Has a good analogy to close(), so people are likely to be able to
  guess what it means.
* Is more specific to IO (conversely, it's easy to imagine shutdown()
  being wanted for any number of things unrelated to closing the write
  side of an IO stream).
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Mar 8, 2022
This name was suggested in JuliaLang#24526 and

* Has a good analogy to close(), so people are likely to be able to
  guess what it means.
* Is more specific to IO (conversely, it's easy to imagine shutdown()
  being wanted for any number of things unrelated to closing the write
  side of an IO stream).
@DilumAluthge DilumAluthge removed this from the 1.x milestone Mar 13, 2022
Keno pushed a commit that referenced this issue Jun 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

No branches or pull requests