Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Downloads backend] are we closeing streams correctly? #433

Closed
ericphanson opened this issue Aug 26, 2021 · 12 comments
Closed

[Downloads backend] are we closeing streams correctly? #433

ericphanson opened this issue Aug 26, 2021 · 12 comments

Comments

@ericphanson
Copy link
Member

Ref #396 (comment)

@christopher-dG
Copy link
Member

I think maybe one thing to do is to just replace all usage of BufferStream with IOBuffer. I dunno if there is a specific reason that we are currently using BufferStream, but it is the only thing I know of with the weird "must be closed to read" requirement.

@c42f
Copy link
Contributor

c42f commented Sep 1, 2021

I think maybe one thing to do is to just replace all usage of BufferStream with IOBuffer. I dunno if there is a specific reason that we are currently using BufferStream

I think I may have found the reason for this. It turns out that HTTP.StreamLayer actually closes the stream:

https://github.com/JuliaWeb/HTTP.jl/blob/cfcc595449b9de8869b28f04dcb25ad1f14438e3/src/StreamRequest.jl#L149

But a closed IOBuffer may not have its buffer read:

julia> io = IOBuffer()
       close(io)
       read(io)
ERROR: ArgumentError: read failed, IOBuffer is not readable
...

The underlying problem here may be that Base had no distinction between closeread() and closewrite().

  • HTTP.jl calls close and assumes it means closewrite (I guess). But this makes it unusable with IOBuffer
  • Downloads.jl doesn't close the stream at all which means the caller needs to close it if it's a BufferStream.

See also JuliaLang/julia#24526

@c42f
Copy link
Contributor

c42f commented Sep 1, 2021

Another cross ref: It's a known bug that IOBuffer doesn't work with HTTP.jl 😢 JuliaWeb/HTTP.jl#543

@vtjnash
Copy link

vtjnash commented Sep 28, 2021

IOBuffer() is a file-like object, so it is incompatible with streaming modification. Use BufferStream for a stream-like object.

@omus
Copy link
Member

omus commented Sep 29, 2021

IOBuffer() is a file-like object, so it is incompatible with streaming modification. Use BufferStream for a stream-like object.

@vtjnash can you elaborate more? Unfortunately Base.BufferStream is undocumented and unexported and it would be great to know more about when it should be used and why.

@vtjnash
Copy link

vtjnash commented Sep 29, 2021

Using IOBuffer leaves you vulnerable to truncation and data loss. Base.BufferStream needs to be exported, since it does not have that vulnerability, but is otherwise pretty much a drop-in replacement.

@vtjnash
Copy link

vtjnash commented Sep 29, 2021

The exception is if you know that the data is being written synchronously (as HTTP does), as then you can treat the output as a file, instead of expecting a stream.

@omus
Copy link
Member

omus commented Sep 29, 2021

I'll mention I find Base.BufferStream rather quirky as it needs to be closed to use read(::IO) (as opposed to IOBuffer which can only be read while open) and it isn't seekable

@omus
Copy link
Member

omus commented Sep 29, 2021

The reason I was asking about Base.BufferStream was because #384 moves away from using it in favour of internally using IOBuffer.

@vtjnash
Copy link

vtjnash commented Sep 29, 2021

Yes, because you can read immediately, it is vulnerable to losing your data (if it is concurrently used with .writable = true). Generally, HTTP avoids this though, by blocking the main .get call until IOBuffer is fully written.

bors bot added a commit that referenced this issue Sep 29, 2021
384: Use `AWS.Response` to handle streaming/raw/parsing r=omus a=omus

The idea is to use a struct in AWS.jl that can be used to handle the automatic parsing that is currently used to turn XML/JSON into a dict while also giving the option of accessing the raw output as a string or stream without all the keywords currently needed to be specified.

Depends on:
- #457

Related:
- #346
- #348
- #438

Closes:
- #224
- #433
- #468 (when using `use_response_type` the passed in I/O is not closed)
- #471
- #433

Update: The tests in this PR run using the deprecated behaviour. Mainly I did that here to prove this change is non-breaking. For the updated tests see #458 

Co-authored-by: Curtis Vogt <curtis.vogt@gmail.com>
@omus
Copy link
Member

omus commented Sep 29, 2021

As of AWS.jl 1.63.0 (#384) both the DownloadBackend and HTTPBackend no longer closes the streams when using the feature use_response_type (e.g. @service S3 use_response_type=true). By default the response type uses an IOBuffer but any user supplied IO can be provided via "response_stream" (note: I/O types that are seekable work best). If you enable this feature and pass in a Base.BufferStream you're responsible for closing it yourself.

When not enabling the use_response_type feature the current behaviour as of AWS 1.62.1 remains unchanged.

@omus omus closed this as completed Sep 29, 2021
@iamed2
Copy link
Member

iamed2 commented Jun 21, 2022

any user supplied IO can be provided via "response_stream" (note: I/O types that are seekable work best).

non-seekable I/O types specifically do not work when encountering errors: JuliaCloud/AWSS3.jl#249

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants