Future of Base.download #27043

simonbyrne · 2018-05-09T04:31:39Z

We currently export a download function from Base for downloading files. It has some issues:

it shells out to an external program, currently wget, curl or fetch on unix, or a PowerShell command on Windows. Since these utilities are not bundled with Julia, they can cause problems if they don't exist on the system or are otherwise misconfigured (New download implementation doesn't work on Win7 #26459, file testset fails unless curl is installed (0.6.0) #22783, RPATH issue with bundled CURL binary #17889)
it is fairly basic feature wise: e.g. it doesn't support:
- "silent" mode for hiding output (Adding support for silent option to download #15930)
- capturing of stdout/stderr
- consistent progress bar
- logging
- useful error codes/messages
- restarting terminated downloads
the whole GitHub TLS mess

As a result, several packages implement their own functionality, notably BinDeps.jl, BinaryProvider.jl, and Pkg3.jl (pending JuliaLang/Pkg.jl#280). In fact, it doesn't appear to be currently used anywhere in Base or stdlib (it is mentioned in a couple of comments in the REPL code).

Some options:

Move it to a stdlib package to make it easier to update, add features etc (at least it will once we have versioned stdlib packages).
Add the functionality to HTTP.jl, and move that to stdlib. This would have the advantage of getting rid of the mish-mash of shell code we currently use (suggested in Add Sys.which(), use that to find curl in download() #26559 (comment))
Get rid of it altogether.

cc: @staticfloat @tkelman @samoconnor

The text was updated successfully, but these errors were encountered:

staticfloat · 2018-05-09T05:14:21Z

Personally, I think having a first-class implementation of HTTP in the stdlib makes sense.

Petr-Hlavenka · 2018-05-09T06:33:36Z

I do not like the fact that so many packages implement their own version of download. I think using a single implementation in stdlib everywhere makes the most sense. And in an edge case, where it is necessary to hack this functionality to get through some firewall/policies on corporate network one has to modify only one function and the rest of the ecosystem will just work.

StefanKarpinski · 2018-05-09T13:11:20Z

Should we as a bare minimum for 1.0 unexport download so that we're free to delete it in a 1.x version when we have a Download stdlib package that implements the good version of this?

JeffBezanson · 2018-05-09T13:39:17Z

Yes I think it could make sense to put HTTP in the stdlib. It also seems reasonable to me to separate the client functionality, since that's needed much more often than server.

JeffBezanson · 2018-05-10T18:27:41Z

Though I have to say this doesn't seem urgent enough to me to block 0.7.

Keno · 2018-05-10T18:28:07Z

Triage: It seems too late right now to do anything about this at the moment, and doesn't seem super urgent since packages can always just use HTTP themselves. We can also likely just start using HTTP as the backend in 1.x if we move HTTP in.

staticfloat · 2018-05-10T19:38:00Z

Should we as a bare minimum for 1.0 unexport download so that we're free to delete it in a 1.x version when we have a Download stdlib package that implements the good version of this?

I think we don't even need to do that; let's just leave download(url, path) as it is, and then when we move HTTP.jl into stdlib (assuming that's what we do) we can @deprecate download(url, path) -> HTTP.download(url, path) or something similar.

EDIT Aaaand, that's pretty much exactly what Keno said.

staticfloat · 2018-10-26T07:57:14Z

Now that we're in 1.1 land, let's revisit this. I think it makes sense to have a julia-native HTTP stack in stdlib; and I'd love to not rely on curl, wget, fetch, etc... like we do. Of course, in order to deal with TLS we'll have to bundle MbedTLS again, but that shouldn't be too big of a problem; we use curl or similar to bootstrap HTTP.jl and MbedTLS.jl (similarly to how we bootstrap Pkg in #29615) then use HTTP for all downloads of other packages from them on.

musm · 2018-11-12T20:43:25Z

Could we just bundle https://github.com/curl/curl-for-win (https://bintray.com/vszakats/generic/curl) with windows installs? It's only ~3MB.

Versions of windows newer than Build 17063
will come installed with curl by default
https://blogs.technet.microsoft.com/virtualization/2017/12/19/tar-and-curl-come-to-windows/

musm · 2018-11-16T16:39:51Z

curl specializes in data transfers of a single url. Studying usages of Base.download the functions most common usages are consistent with curl's main purpose. It supports a couple of really nice features that @simonbyrne mentioned in the first post, such as restarting terminated downloads. I don't think I need to add more on the library, as it's distributed by a lot of software and has been well tested.

I don't necessarily see including a base download function that relies on curl as orthogonal to a julia-native HTTP stack with it's own download function.

We can either include and update the Base.download function to consistently use curl, while waiting for the julia-native HTTP stack to stabilize, after which, we can either keep Base.download as a 'lightweight' basic download function or deprecate that function to recommend users to use HTTP.download .

Thoughts?

StefanKarpinski · 2018-11-16T20:50:14Z

Why don't we just ship curl everywhere? That seems like it would solve this issue. We originally wrote the download tool to avoid needing to ship it but at this point it just seems easier to ship it.

simonbyrne · 2018-11-16T22:33:31Z

Or use https://github.com/JuliaWeb/LibCURL.jl?

StefanKarpinski · 2018-11-16T22:49:17Z

Seems like a non-starter since we need to download things before Julia is built.

simonbyrne · 2018-11-16T23:00:59Z

What do we need to download before Julia is built?

StefanKarpinski · 2018-11-17T00:01:56Z

Tarballs in deps during the build.

simonbyrne · 2018-11-17T00:05:27Z

Is there any reason why that needs to be the same as Base.download?

StefanKarpinski · 2018-11-17T00:26:53Z

No, but if we're going to ship curl why not use if for everything?

simonbyrne · 2018-11-17T00:59:03Z

No, but if we're going to ship curl why not use if for everything?

Because the deps download build step has different needs and problems (e.g. how do you download curl if it isn't installed?)

The main reason for using libcurl over curl is that it was designed exactly for this purpose, and so we would be able to provide better experience, i.e. we could hook into a standard Julia progress meter, directly handle logging rather than capturing output, etc.

staticfloat · 2018-11-17T14:11:49Z

Using libCURL is definitely an option, but why not just bundle HTTP.jl directly? In my usage, client usage of HTTP.jl has been very stable, and "simply" downloading files should be pretty well ironed out by now, IMO.

EDIT: In case this wasn't clear, I'm specifically talking about Base.download(), and not talking about build-time downloading of deps; that should continue to be handled by jldownload.

musm · 2018-11-17T20:12:44Z

Maybe @oxinabox can provide his feedback since he implemented HTTP.download https://github.com/JuliaWeb/HTTP.jl/pull/273/files How does proxy support look like? and also downloads of large files and retries.

LibCurl supports retry from an existingly stopped download and numerous features that could provide a robust download function.

Personally I'm in favor of using LibCurl (I'm not against using HTTP.download but have some reservations)

EDIT: In case this wasn't clear, I'm specifically talking about Base.download(), and not talking about build-time downloading of deps; that should continue to be handled by jldownload.

Indeed, I think this discussion should focus solely on the future of the user facing Base.download function, not how the makefile downloads deps, which is orthogonal.

oxinabox · 2018-11-18T10:31:27Z

So i added HTTP.download
For purposes of servicing DataDeps.jl

Its features are

more robust across platforms vs Base.download , I was getting occasional errors in AppVeyor before.
better error messages. While HTTP.jl gives huge error messages they are better than what Base.download gives in windows on AppVeyor which is an error code that you need to change into hex and search some Microsoft docs.
Use the Julia Logging/Progress bars stuff. It's progress bars work with OhMyREPL and Juno (I think).
It knows about filenames. This is kinda a big one for DataDeps, since the filename can be (and often is) specified in http response headers. Getting it is kinda involved. I believe curl can do this too so I guess LibCurl should have this functionality. Right now Base.download gives everything a random filename, unless one is passed in.

If Julia's Base.download gained these functionalities I would drop DataDeps dependency on HTTP.jl

nalimilan · 2018-11-18T12:28:53Z

@oxinabox How does HTTP.jl fare with proxies? That's the main concern AFAICT.

oxinabox · 2018-11-18T12:50:14Z

@nalimilan I have no idea.
I've never been in a position to need a proxy in julia.
It looks like it generally works for most proxies
JuliaWeb/HTTP.jl#218 (comment)
see also https://github.com/JuliaWeb/HTTP.jl/issues?utf8=%E2%9C%93&q=++proxy+
I'm not sure if it automatically reads the HTTP_PROXY enviroment var. @samoconnor ?

Considering the word proxy has been mentioned in this thread exactly once before your comment now, I wouldn't have thought it was the main concern (I would say main concern is getting off the hack that is shelling out to commandline tools, and all related issues that come from that).
But I do know for some people that is a killer concern, so you are right is should be considered.

If it doesn't work right in HTTP that probably also should be fixed, even if HTTP.download isn't used in Base.
(HTTP.download is more or less a wrapper around HTTP.request so if one works then the other will)

farrellm · 2019-01-02T23:10:39Z

Yes, proxies are a huge concern. I want to be able to use Julia at work, and, like many people, must access the internet through a proxy at work. In my experience, wget is most likely to work from behind a proxy.

ZHENGSY · 2019-06-14T06:26:02Z

For some reasons curl is very slow and fail almost all the time in China. Some guys successfully used aria2c instead especially when downloading binary dependences by redefining Base.download.

Is it possible to add aria2c into the list of external programs? And it would be nice if we can choose the tool from the list via say config entry.

oxinabox · 2019-06-14T08:24:58Z

HTTP.download now respects the proxy environment variables.

giordano · 2019-10-13T14:44:29Z

I'd like to point out that now that more and more packages are being provided through BinaryProvider users are starting to get all sorts of problems with their local download client. Some recent examples:

In the two first cases the users claimed that curl worked fine outside Julia. Moving the installation problems from the package manager to the download client doesn't seem a big improvement in the user experience, but hopefully fewer people are getting into troubles now.

A pure Julia client would likely improve the situation -- no more strange interaction with random libraries floating on the system.

What do other languages (R, Python, etc...) use to download files from the Internet?

KristofferC · 2019-10-13T15:16:26Z

Here is R https://stat.ethz.ch/R-manual/R-devel/library/utils/html/download.file.html. Seems to be libcurl for Linux and wininet for Windows.

kkmann · 2020-01-13T12:12:41Z

in the two first cases the users claimed that curl worked fine outside Julia.

same here for me: https://discourse.julialang.org/t/curl-certificate-issue-with-pkg3-on-mac/33299/7.
I would definitely appreciate a more 'official' solution than the sys-level 'insecure' hack ;)

tk3369 · 2020-09-05T18:25:47Z

The download function is hopelessly broken behind a corporate proxy due to the SSL certificate verification check.

┌ Error: Download failed: curl: (35) schannel: next InitializeSecurityContext failed: Unknown error (0x80092012) - The revocation function was unable to check revocation for the certificate.
└ @ Base download.jl:43

How about we just enhance the following line of code to accept a flag from the environment e.g. SSL_VERIFY? If the value of SSL_VERIFY is 0, then add a -k option to curl command.

julia/base/download.jl

Line 40 in 1728ada

    
           process = run(pipeline(`$curl_exe -s -S -g -L -f -o $filename $url`, stderr=err), wait=false)

StefanKarpinski · 2020-09-05T18:51:02Z

#37340 may help since we will fully control the download process and it will be libcurl properly wrapped from Julia.

KristofferC · 2020-09-05T18:55:58Z

Since #37340 adds it as a stdlib will Base.download "reach out" for the stdlib and use it if it exists or will part of it move into Base?

StefanKarpinski · 2020-09-05T19:45:35Z

That was my plan rather than putting it in Base.

KristofferC · 2020-09-06T10:25:42Z

Coupling Base -> stdlibs is a bit similar to type piracy in the sense that "normal" Bas functionality that doesn't require the stdlib start working differently when the stdlib is removed. So if we make a new sysimage that doesn't include the Download stdlib (perhaps because no package in our project depends on it), then what will happen to the Base.download function (that might still be used in our project)? This is a bit similar to how a project can do rand(2,2) * rand(2,2) without depending on Random or LinearAlgebra but breaks if either of those stdlibs are not in the sysimage.

StefanKarpinski · 2020-09-12T16:39:01Z

It will fail: if you want to use Base.download you need to have Downloads.

StefanKarpinski · 2020-10-21T20:00:43Z

I think I can safely close this now. The answer is that the future of downloading things is here and it is Downloads.jl, which internally uses libcurl to present a simple but very scalable download API. This package works on Julia 1.3-1.5 as an external package and is a new stdlib in 1.6 (and used to implement Base.download) and what Pkg uses to download things. Having a single, common, cross-platform way to download things should make it much more reliable, and since libcurl can deal with proxies, etc. this should allow us to make downloads just work.

StefanKarpinski added the triage label May 9, 2018

Keno removed the triage label May 10, 2018

JeffBezanson mentioned this issue May 21, 2018

RPATH issue with bundled CURL binary #17889

Closed

kshyatt added the io label May 27, 2018

stevengj mentioned this issue Jun 13, 2018

Endless miniconda install loop JuliaPy/Conda.jl#79

Open

simonbyrne mentioned this issue Oct 21, 2018

BinaryProvider.download not checking for Windows Management Framework JuliaPackaging/BinaryProvider.jl#127

Open

simonbyrne mentioned this issue Nov 12, 2018

Use busybox wget for downloading on Windows #29752

Closed

c42f mentioned this issue Mar 27, 2019

Add function to query windows known folders #31331

Closed

fredrikekre mentioned this issue May 10, 2019

This functionality should be in Julia base library JuliaWeb/HTTP.jl#415

Closed

3ch01c mentioned this issue Jun 14, 2019

ERROR: ArgumentError: Unknown time zone "America/Denver" JuliaTime/TimeZones.jl#206

Closed

fredrikekre mentioned this issue Aug 6, 2019

unify BinaryProvider.download with Base.download #32804

Closed

fredrikekre mentioned this issue Aug 27, 2019

Allow download to take a local directory to download into #33088

Closed

stevengj mentioned this issue Sep 6, 2019

default to /usr/bin/curl on macOS JuliaPackaging/BinaryProvider.jl#179

Merged

BeastyBlacksmith mentioned this issue Sep 26, 2019

Wrong download-path for FFTW? JuliaPackaging/BinaryProvider.jl#182

Closed

giordano mentioned this issue Oct 29, 2019

Installation of package plots fails JuliaPlots/Plots.jl#2184

Closed

stevengj mentioned this issue Dec 18, 2019

Libcurl errors with download() #33950

Closed

tk3369 mentioned this issue Sep 5, 2020

Unable to download from URL from behind a corporate proxy server fonsp/Pluto.jl#365

Closed

fonsp mentioned this issue Sep 8, 2020

Bad Query, Failed Process when trying to open hw0 Pluto Notebook fonsp/Pluto.jl#386

Closed

StefanKarpinski closed this as completed Oct 21, 2020

Future of Base.download #27043

Future of Base.download #27043

Comments

simonbyrne commented May 9, 2018 • edited Loading

staticfloat commented May 9, 2018

Petr-Hlavenka commented May 9, 2018

StefanKarpinski commented May 9, 2018

JeffBezanson commented May 9, 2018

JeffBezanson commented May 10, 2018

Keno commented May 10, 2018

staticfloat commented May 10, 2018 • edited Loading

staticfloat commented Oct 26, 2018

musm commented Nov 12, 2018

musm commented Nov 16, 2018

StefanKarpinski commented Nov 16, 2018

simonbyrne commented Nov 16, 2018

StefanKarpinski commented Nov 16, 2018

simonbyrne commented Nov 16, 2018

StefanKarpinski commented Nov 17, 2018

simonbyrne commented Nov 17, 2018

StefanKarpinski commented Nov 17, 2018

simonbyrne commented Nov 17, 2018

staticfloat commented Nov 17, 2018 • edited Loading

musm commented Nov 17, 2018

oxinabox commented Nov 18, 2018

nalimilan commented Nov 18, 2018

oxinabox commented Nov 18, 2018

farrellm commented Jan 2, 2019

ZHENGSY commented Jun 14, 2019 • edited Loading

oxinabox commented Jun 14, 2019

giordano commented Oct 13, 2019 • edited Loading

KristofferC commented Oct 13, 2019 • edited Loading

kkmann commented Jan 13, 2020

tk3369 commented Sep 5, 2020

StefanKarpinski commented Sep 5, 2020

KristofferC commented Sep 5, 2020

StefanKarpinski commented Sep 5, 2020

KristofferC commented Sep 6, 2020 • edited Loading

StefanKarpinski commented Sep 12, 2020

StefanKarpinski commented Oct 21, 2020

simonbyrne commented May 9, 2018 •

edited

Loading

staticfloat commented May 10, 2018 •

edited

Loading

staticfloat commented Nov 17, 2018 •

edited

Loading

ZHENGSY commented Jun 14, 2019 •

edited

Loading

giordano commented Oct 13, 2019 •

edited

Loading

KristofferC commented Oct 13, 2019 •

edited

Loading

KristofferC commented Sep 6, 2020 •

edited

Loading