Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set CURLOPT_NOBODY automatically for HEADs? #131

Closed
ericphanson opened this issue Jul 12, 2021 · 5 comments
Closed

Set CURLOPT_NOBODY automatically for HEADs? #131

ericphanson opened this issue Jul 12, 2021 · 5 comments

Comments

@ericphanson
Copy link
Contributor

ericphanson commented Jul 12, 2021

In

set_body(easy, have_output)
we decide whether or not to set CURLOPT_NOBODY based on whether or not the user specifies an output. On the one hand, that makes sense, and lets the caller decide whether or not to set CURLOPT_NOBODY in a simple and intuitive way. On the other hand, HEAD responses should reply with a Content-Length if the GET would but should not return a body; if additionally CURLOPT_NOBODY is not set, this this will cause hangs. Thus, I wonder if Downloads.jl should set CURLOPT_NOBODY for HEADs automatically.

xref JuliaCloud/AWS.jl@a481a46

@StefanKarpinski
Copy link
Member

Can you give a self-contained example of the problematic behavior?

@ericphanson
Copy link
Contributor Author

Can you give a self-contained example of the problematic behavior?

julia> server = "https://httpbingo.julialang.org"
"https://httpbingo.julialang.org"

julia> output = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)

julia> request(server * "/image/jpeg"; method="HEAD", output)
ERROR: HTTP/1.1 200 OK (Operation too slow. Less than 1 bytes/sec transferred the last 20 seconds) while requesting https://httpbingo.julialang.org/image/jpeg
Stacktrace:
 [1] (::Downloads.var"#9#18"{IOBuffer, Base.DevNull, String, Vector{Pair{String, String}}, Float64, Nothing, Bool, Bool, String, Int64, Bool, Bool})(easy::Downloads.Curl.Easy)
   @ Downloads /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.7/Downloads/src/Downloads.jl:369
 [2] with_handle(f::Downloads.var"#9#18"{IOBuffer, Base.DevNull, String, Vector{Pair{String, String}}, Float64, Nothing, Bool, Bool, String, Int64, Bool, Bool}, handle::Downloads.Curl.Easy)
   @ Downloads.Curl /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.7/Downloads/src/Curl/Curl.jl:64
 [3] #8
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.7/Downloads/src/Downloads.jl:311 [inlined]
 [4] arg_write(f::Downloads.var"#8#17"{Base.DevNull, String, Vector{Pair{String, String}}, Float64, Nothing, Bool, Bool, String, Int64, Bool, Bool}, arg::IOBuffer)
   @ ArgTools /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.7/ArgTools/src/ArgTools.jl:112
 [5] #7
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.7/Downloads/src/Downloads.jl:310 [inlined]
 [6] arg_read
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.7/ArgTools/src/ArgTools.jl:61 [inlined]
 [7] request(url::String; input::Nothing, output::IOBuffer, method::String, headers::Vector{Pair{String, String}}, timeout::Float64, progress::Nothing, verbose::Bool, throw::Bool, downloader::Nothing)
   @ Downloads /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.7/Downloads/src/Downloads.jl:309
 [8] top-level scope
   @ REPL[82]:1

@StefanKarpinski
Copy link
Member

Ok, I see the issue. If you make a HEAD request and don't specify any output argument, then it works fine but if you make a HEAD request with an output argument then the request will hang. I guess the question is why are you making a HEAD request with an output argument, but I guess it would make sense to change this. Seems like changing that line to set_body(easy, have_output && method != "HEAD") would fix this issue. Then the other thing to do would be to add your test case to the tests. All the other tests seem to still pass for me with this. Want to make a pull request?

@ericphanson
Copy link
Contributor Author

I guess the question is why are you making a HEAD request with an output argument

Yeah, I think that's a valid question. In my case, it was because AWS.jl decides what kind of request to make in one of a zillion autogenerated functions, and then passes that to a generic submit_request function which just translates it to HTTP.jl (currently) or Downloads.jl (my PR) and wraps it in retry loops etc. So that submit_request function previously was just always passing an output and assuming it will be empty when there is no body, leading to confusing errors (well, confusing for me, partly because whether or not you got the error depended on whether you called isfile first which issued a HEAD vs just s3_get, and I didn't notice the difference in the two code paths at first).

I updated my PR to inspect what kind of request we are making and choose whether or not to pass an output accordingly, which is a fine solution for AWS.jl. But I thought it was kind of a "gotcha" because it seems like it is never useful to wait for a body on a HEAD request (though if you really needed to, you could set it in easy_hook), and I thought this scenario of "generically" wrapping Downloads for some downstream task will probably come up again.

Seems like changing that line to set_body(easy, have_output && method != "HEAD") would fix this issue. Then the other thing to do would be to add your test case to the tests. All the other tests seem to still pass for me with this. Want to make a pull request?

Sure, I'll give it a go.

@StefanKarpinski
Copy link
Member

I agree that it's best for us to try to automatically do what will work here. If someone wanted to override that behavior and unset the CURLOPT_NOBODY option, they could always use the easy_hook escape hatch.

ericphanson added a commit to ericphanson/Downloads.jl that referenced this issue Jul 13, 2021
ericphanson added a commit to ericphanson/Downloads.jl that referenced this issue Jul 13, 2021
ericphanson added a commit to ericphanson/Downloads.jl that referenced this issue Jan 26, 2022
ericphanson added a commit to ericphanson/Downloads.jl that referenced this issue Jan 27, 2022
DilumAluthge added a commit that referenced this issue Mar 3, 2022
* Before building and testing the package, make sure that the UUID has not been edited (#128)

(cherry picked from commit 21843d0)

* CI: Standardize the workflow for testing and changing the UUID (#129)

(cherry picked from commit cd002c3)

* fix #131 and add test (#132)

(cherry picked from commit adbb974)

* Improve inferability of download() (#133)

(cherry picked from commit 848d374)

* fix ci badge (#137)

(cherry picked from commit 3870614)

* Fix a handful of invalidations in expression-checking (#138)

ChainRulesCore defines `==(a, b::AbstractThunk)` and its converse,
and this invalidates a couple of poorly-typed Symbol checks.
This more "SSA-like" way of writing the code is easier to infer.

(cherry picked from commit 25f7af3)

* tests: skip wrong host test for SSL_NO_VERIFY (fix #139) (#140)

Since #114, we only turn
off peer verification, not host verification when the `SSL_NO_VERIFY`
variables are set. This means that the last set of tests in the "SSL no
verify override" testset *should* fail for `wrong.host.badssl.com`. That
is not what I was seeing, however — the test was still passing — which I
found puzzling but just moved on with my life at the time. It turns out
that the test *does* fail if libcurl is build with OpenSSL. Since
whether the test passes or not for that host depends on how things are
built, this change simply skips the test (by popping the URL from the
set of tested URLS for that testset).

The tests above that which use the easy hook mechanism are fixed in a
different way: for those I made the hook disable both host and peer
verification, which should fix the tests for any bad host including when
the server sends the wrong host name.

(cherry picked from commit e22219f)

* Fix input body size detection for IOBuffer(codeunits(str)) (#143)

Somewhat surprisingly, the type of this is not IOBuffer, but a related
type (Base.GenericIOBuffer{Base.CodeUnits{UInt8, String}}).

(cherry picked from commit 470b7f0)

* Typo fix: indiation -> indication (#144)

(cherry picked from commit 5f1509d)

* use Timer instead of libuv timer API

(cherry picked from commit 11493ff)

* use FDWatcher instead of libuv poll API

(cherry picked from commit 4c1d2af)

* fix wrong definition of curl_socket_t on Windows

(cherry picked from commit 2eb0491)

* Revert "stop using raw libuv API" (#156)

(cherry picked from commit c91876a)

* Revert "Revert "stop using raw libuv API" (#156)"

This reverts commit c91876a.

(cherry picked from commit 69acc13)

* add missing locks during Timer callbacks

(cherry picked from commit 43a3484)

* fix Timer usage (#158)

(cherry picked from commit 62b497e)

* Workaround for missing isopen check in FDWatcher (#161)

(possible multithread race with this still needs to be fixed)

(cherry picked from commit 7f91b8a)

* Check for timer isopen correctly (#162)

(cherry picked from commit 4250b35)

* remove trailing whitespace

(cherry picked from commit d8c626b)

* Avoid infinite recursion in `timer_callback` (#164)

Fixes #163

(cherry picked from commit a55825b)

* should also look into headers for input_size (#167)

If no content length is set while uploading some contents, Curl defaults to use
chunked transfer encoding. In some cases we want to prevent that because the
server may not support chunked transfers.

With this change, the request method will also look at the headers while
determining the input size and if found call `set_upload_size` as usual. So to
switch off chunked transfers, one must also know and set the content length
header while invoking `download` or `request` methods.

(cherry picked from commit ab628ab)

* rename: singularize add_{upload,seek}_callback

These only add one callback so having them be plural is weird.

(cherry picked from commit 5bd0826)

* add support for setting a debug callback

(cherry picked from commit 55a0c39)

* end-to-end tests for #167

This adds end-to-end tests for the changes introduced in #167.

Verbose mode is switched off for these tests, but switching it on would show that not setting content-length headers results in chunked transfer encoding while setting it prevents that. Both tests should pass.

(cherry picked from commit 911368d)

* tests: use debug option to test for non/chunked uploads

This combines the functionality from the previous two commits to not
only trigger both chunked and non-chunked uploads, but also test for
that difference by capturing and inspecting the debug events.

(cherry picked from commit 4e0408a)

* bump patch

Co-authored-by: Dilum Aluthge <dilum@aluthge.com>
Co-authored-by: Jakob Nybo Nissen <jakobnybonissen@gmail.com>
Co-authored-by: Yuto Horikawa <hyrodium@gmail.com>
Co-authored-by: Tim Holy <tim.holy@gmail.com>
Co-authored-by: Stefan Karpinski <stefan@karpinski.org>
Co-authored-by: Chris Foster <chris42f@gmail.com>
Co-authored-by: Benoît Legat <benoit.legat@gmail.com>
Co-authored-by: Jameson Nash <vtjnash@gmail.com>
Co-authored-by: Tanmay Mohapatra <tanmaykm@gmail.com>
StefanKarpinski pushed a commit that referenced this issue Mar 24, 2022
(cherry picked from commit adbb974)
StefanKarpinski pushed a commit that referenced this issue Mar 24, 2022
(cherry picked from commit adbb974)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants