Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Future of Base.download #27043

Closed
simonbyrne opened this issue May 9, 2018 · 36 comments
Closed

Future of Base.download #27043

simonbyrne opened this issue May 9, 2018 · 36 comments
Labels
io Involving the I/O subsystem: libuv, read, write, etc.

Comments

@simonbyrne
Copy link
Contributor

simonbyrne commented May 9, 2018

We currently export a download function from Base for downloading files. It has some issues:

As a result, several packages implement their own functionality, notably BinDeps.jl, BinaryProvider.jl, and Pkg3.jl (pending JuliaLang/Pkg.jl#280). In fact, it doesn't appear to be currently used anywhere in Base or stdlib (it is mentioned in a couple of comments in the REPL code).

Some options:

  • Move it to a stdlib package to make it easier to update, add features etc (at least it will once we have versioned stdlib packages).
  • Add the functionality to HTTP.jl, and move that to stdlib. This would have the advantage of getting rid of the mish-mash of shell code we currently use (suggested in Add Sys.which(), use that to find curl in download() #26559 (comment))
  • Get rid of it altogether.

cc: @staticfloat @tkelman @samoconnor

@staticfloat
Copy link
Member

Personally, I think having a first-class implementation of HTTP in the stdlib makes sense.

@Petr-Hlavenka
Copy link
Contributor

I do not like the fact that so many packages implement their own version of download. I think using a single implementation in stdlib everywhere makes the most sense. And in an edge case, where it is necessary to hack this functionality to get through some firewall/policies on corporate network one has to modify only one function and the rest of the ecosystem will just work.

@StefanKarpinski
Copy link
Member

Should we as a bare minimum for 1.0 unexport download so that we're free to delete it in a 1.x version when we have a Download stdlib package that implements the good version of this?

@StefanKarpinski StefanKarpinski added the triage This should be discussed on a triage call label May 9, 2018
@JeffBezanson
Copy link
Member

Yes I think it could make sense to put HTTP in the stdlib. It also seems reasonable to me to separate the client functionality, since that's needed much more often than server.

@JeffBezanson
Copy link
Member

Though I have to say this doesn't seem urgent enough to me to block 0.7.

@Keno
Copy link
Member

Keno commented May 10, 2018

Triage: It seems too late right now to do anything about this at the moment, and doesn't seem super urgent since packages can always just use HTTP themselves. We can also likely just start using HTTP as the backend in 1.x if we move HTTP in.

@Keno Keno removed the triage This should be discussed on a triage call label May 10, 2018
@staticfloat
Copy link
Member

staticfloat commented May 10, 2018

Should we as a bare minimum for 1.0 unexport download so that we're free to delete it in a 1.x version when we have a Download stdlib package that implements the good version of this?

I think we don't even need to do that; let's just leave download(url, path) as it is, and then when we move HTTP.jl into stdlib (assuming that's what we do) we can @deprecate download(url, path) -> HTTP.download(url, path) or something similar.

EDIT Aaaand, that's pretty much exactly what Keno said.

@staticfloat
Copy link
Member

Now that we're in 1.1 land, let's revisit this. I think it makes sense to have a julia-native HTTP stack in stdlib; and I'd love to not rely on curl, wget, fetch, etc... like we do. Of course, in order to deal with TLS we'll have to bundle MbedTLS again, but that shouldn't be too big of a problem; we use curl or similar to bootstrap HTTP.jl and MbedTLS.jl (similarly to how we bootstrap Pkg in #29615) then use HTTP for all downloads of other packages from them on.

@musm
Copy link
Contributor

musm commented Nov 12, 2018

Could we just bundle https://github.com/curl/curl-for-win (https://bintray.com/vszakats/generic/curl) with windows installs? It's only ~3MB.

Versions of windows newer than Build 17063
will come installed with curl by default
https://blogs.technet.microsoft.com/virtualization/2017/12/19/tar-and-curl-come-to-windows/

@musm
Copy link
Contributor

musm commented Nov 16, 2018

curl specializes in data transfers of a single url. Studying usages of Base.download the functions most common usages are consistent with curl's main purpose. It supports a couple of really nice features that @simonbyrne mentioned in the first post, such as restarting terminated downloads. I don't think I need to add more on the library, as it's distributed by a lot of software and has been well tested.

I don't necessarily see including a base download function that relies on curl as orthogonal to a julia-native HTTP stack with it's own download function.

We can either include and update the Base.download function to consistently use curl, while waiting for the julia-native HTTP stack to stabilize, after which, we can either keep Base.download as a 'lightweight' basic download function or deprecate that function to recommend users to use HTTP.download .

Thoughts?

@StefanKarpinski
Copy link
Member

Why don't we just ship curl everywhere? That seems like it would solve this issue. We originally wrote the download tool to avoid needing to ship it but at this point it just seems easier to ship it.

@simonbyrne
Copy link
Contributor Author

@StefanKarpinski
Copy link
Member

Seems like a non-starter since we need to download things before Julia is built.

@simonbyrne
Copy link
Contributor Author

What do we need to download before Julia is built?

@StefanKarpinski
Copy link
Member

Tarballs in deps during the build.

@simonbyrne
Copy link
Contributor Author

Is there any reason why that needs to be the same as Base.download?

@StefanKarpinski
Copy link
Member

No, but if we're going to ship curl why not use if for everything?

@simonbyrne
Copy link
Contributor Author

No, but if we're going to ship curl why not use if for everything?

Because the deps download build step has different needs and problems (e.g. how do you download curl if it isn't installed?)

The main reason for using libcurl over curl is that it was designed exactly for this purpose, and so we would be able to provide better experience, i.e. we could hook into a standard Julia progress meter, directly handle logging rather than capturing output, etc.

@staticfloat
Copy link
Member

staticfloat commented Nov 17, 2018

Using libCURL is definitely an option, but why not just bundle HTTP.jl directly? In my usage, client usage of HTTP.jl has been very stable, and "simply" downloading files should be pretty well ironed out by now, IMO.

EDIT: In case this wasn't clear, I'm specifically talking about Base.download(), and not talking about build-time downloading of deps; that should continue to be handled by jldownload.

@musm
Copy link
Contributor

musm commented Nov 17, 2018

Maybe @oxinabox can provide his feedback since he implemented HTTP.download https://github.com/JuliaWeb/HTTP.jl/pull/273/files How does proxy support look like? and also downloads of large files and retries.

LibCurl supports retry from an existingly stopped download and numerous features that could provide a robust download function.

Personally I'm in favor of using LibCurl (I'm not against using HTTP.download but have some reservations)

EDIT: In case this wasn't clear, I'm specifically talking about Base.download(), and not talking about build-time downloading of deps; that should continue to be handled by jldownload.

Indeed, I think this discussion should focus solely on the future of the user facing Base.download function, not how the makefile downloads deps, which is orthogonal.

@oxinabox
Copy link
Contributor

So i added HTTP.download
For purposes of servicing DataDeps.jl

Its features are

  • more robust across platforms vs Base.download , I was getting occasional errors in AppVeyor before.
  • better error messages. While HTTP.jl gives huge error messages they are better than what Base.download gives in windows on AppVeyor which is an error code that you need to change into hex and search some Microsoft docs.
  • Use the Julia Logging/Progress bars stuff. It's progress bars work with OhMyREPL and Juno (I think).
  • It knows about filenames. This is kinda a big one for DataDeps, since the filename can be (and often is) specified in http response headers. Getting it is kinda involved. I believe curl can do this too so I guess LibCurl should have this functionality. Right now Base.download gives everything a random filename, unless one is passed in.

If Julia's Base.download gained these functionalities I would drop DataDeps dependency on HTTP.jl

@nalimilan
Copy link
Member

@oxinabox How does HTTP.jl fare with proxies? That's the main concern AFAICT.

@oxinabox
Copy link
Contributor

@nalimilan I have no idea.
I've never been in a position to need a proxy in julia.
It looks like it generally works for most proxies
JuliaWeb/HTTP.jl#218 (comment)
see also https://github.com/JuliaWeb/HTTP.jl/issues?utf8=%E2%9C%93&q=++proxy+
I'm not sure if it automatically reads the HTTP_PROXY enviroment var. @samoconnor ?

Considering the word proxy has been mentioned in this thread exactly once before your comment now, I wouldn't have thought it was the main concern (I would say main concern is getting off the hack that is shelling out to commandline tools, and all related issues that come from that).
But I do know for some people that is a killer concern, so you are right is should be considered.

If it doesn't work right in HTTP that probably also should be fixed, even if HTTP.download isn't used in Base.
(HTTP.download is more or less a wrapper around HTTP.request so if one works then the other will)

@farrellm
Copy link

farrellm commented Jan 2, 2019

Yes, proxies are a huge concern. I want to be able to use Julia at work, and, like many people, must access the internet through a proxy at work. In my experience, wget is most likely to work from behind a proxy.

@ZHENGSY
Copy link

ZHENGSY commented Jun 14, 2019

For some reasons curl is very slow and fail almost all the time in China. Some guys successfully used aria2c instead especially when downloading binary dependences by redefining Base.download.

Is it possible to add aria2c into the list of external programs? And it would be nice if we can choose the tool from the list via say config entry.

@oxinabox
Copy link
Contributor

HTTP.download now respects the proxy environment variables.

@giordano
Copy link
Contributor

giordano commented Oct 13, 2019

I'd like to point out that now that more and more packages are being provided through BinaryProvider users are starting to get all sorts of problems with their local download client. Some recent examples:

In the two first cases the users claimed that curl worked fine outside Julia. Moving the installation problems from the package manager to the download client doesn't seem a big improvement in the user experience, but hopefully fewer people are getting into troubles now.

A pure Julia client would likely improve the situation -- no more strange interaction with random libraries floating on the system.

What do other languages (R, Python, etc...) use to download files from the Internet?

@KristofferC
Copy link
Member

KristofferC commented Oct 13, 2019

Here is R https://stat.ethz.ch/R-manual/R-devel/library/utils/html/download.file.html. Seems to be libcurl for Linux and wininet for Windows.

@kkmann
Copy link

kkmann commented Jan 13, 2020

in the two first cases the users claimed that curl worked fine outside Julia.

same here for me: https://discourse.julialang.org/t/curl-certificate-issue-with-pkg3-on-mac/33299/7.
I would definitely appreciate a more 'official' solution than the sys-level 'insecure' hack ;)

@tk3369
Copy link
Contributor

tk3369 commented Sep 5, 2020

The download function is hopelessly broken behind a corporate proxy due to the SSL certificate verification check.

┌ Error: Download failed: curl: (35) schannel: next InitializeSecurityContext failed: Unknown error (0x80092012) - The revocation function was unable to check revocation for the certificate.
└ @ Base download.jl:43

How about we just enhance the following line of code to accept a flag from the environment e.g. SSL_VERIFY? If the value of SSL_VERIFY is 0, then add a -k option to curl command.

process = run(pipeline(`$curl_exe -s -S -g -L -f -o $filename $url`, stderr=err), wait=false)

@StefanKarpinski
Copy link
Member

#37340 may help since we will fully control the download process and it will be libcurl properly wrapped from Julia.

@KristofferC
Copy link
Member

Since #37340 adds it as a stdlib will Base.download "reach out" for the stdlib and use it if it exists or will part of it move into Base?

@StefanKarpinski
Copy link
Member

That was my plan rather than putting it in Base.

@KristofferC
Copy link
Member

KristofferC commented Sep 6, 2020

Coupling Base -> stdlibs is a bit similar to type piracy in the sense that "normal" Bas functionality that doesn't require the stdlib start working differently when the stdlib is removed. So if we make a new sysimage that doesn't include the Download stdlib (perhaps because no package in our project depends on it), then what will happen to the Base.download function (that might still be used in our project)? This is a bit similar to how a project can do rand(2,2) * rand(2,2) without depending on Random or LinearAlgebra but breaks if either of those stdlibs are not in the sysimage.

@StefanKarpinski
Copy link
Member

It will fail: if you want to use Base.download you need to have Downloads.

@StefanKarpinski
Copy link
Member

I think I can safely close this now. The answer is that the future of downloading things is here and it is Downloads.jl, which internally uses libcurl to present a simple but very scalable download API. This package works on Julia 1.3-1.5 as an external package and is a new stdlib in 1.6 (and used to implement Base.download) and what Pkg uses to download things. Having a single, common, cross-platform way to download things should make it much more reliable, and since libcurl can deal with proxies, etc. this should allow us to make downloads just work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

No branches or pull requests