Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow download to take a local directory to download into #33088

Closed
wants to merge 2 commits into from

Conversation

oxinabox
Copy link
Contributor

With this PR if the user does download(url, localdir) where localdir is a path to a directory that currently exiists,
then download will create a file with a tempname within that directory and download into it.

This has a few minor uses. Like easier debugging if you can specify all your downloads go somewhere specific so you can manually inspect them; and the ability to clean them all up after by deleting that known directory.

In the future we might be able to provide the full functionality HTTP.download has
where the filename is worked out based on the HTTP rules, and then placed into the directory, with that name.
https://github.com/JuliaWeb/HTTP.jl/blob/668e7e68747bb333ebde13af8d16add5b82b3b8a/src/download.jl#L78-L85
(There are some flags for this in curl and wget, fetch had an argument about adding this 12 years ago, idk if they did add it or not, but anyway that is a future PR)

This is non-breaking, as without this PR providing a local-path to a directory would cause an error to be thrown.

@fredrikekre
Copy link
Member

#27043, #32804

@oxinabox
Copy link
Contributor Author

For reference of others reading this later:
(As I am pretty sure @fredrikekre already knows):

This PR is largely orthogonal to #27043 and #32804
Changes that would come from #32804 BinaryProvider.download was moved into Base today,
would be around how to download to a given filepath;
where as this PR changes what happens just before that: how to determine the filepath to download to.

And re: #32804 which more broadly considers options.
One option discussed there is HTTP.download was broaght in to replace Base.download.
HTTP.download already has this behavour.
Another is LibCurl, which falls into the same case as discussed for BinaryProvider.jl.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 27, 2019

I don't really like that the meaning of the second argument changes based on whether it exists and is a directory. That feels like it's just asking for bugs and confusion. Having the meaning of code depend on global mutable state in a program is generally bad; having it depend on global mutable state this is in the file system is so much worse than that.

@oxinabox
Copy link
Contributor Author

oxinabox commented Aug 27, 2019

That is a fair-point.
(Though technically the behavour is already affected by this -- if it exist and is a directory an error is thrown right now).

What do you think would be a better way to achieve this goal?

To me downloading into a user provided directory is something that download should be able to do.
Or more precisely: downloading into a user provided directory, and using the server provided filename
is something that download should be able to do.

Getting the second part (server-provided filename)is a bit harder (but not nesc all that hard, with the right commandline flags).
But it requires a way to do that first part (user provided directory).
And this seemed like a reasonable non-breaking way to do it.

I guess there are other options like a dir kwarg?

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 27, 2019

if it exist and is a directory an error is thrown right now

Which is the reasonable thing to do. The meaning of the command doesn't change, all that changes is whether it can be carried out or not.

What do you think would be a better way to achieve this goal?
To me downloading into a user provided directory is something that download should be able to do. Or more precisely: downloading into a user provided directory, and using the server provided filename is something that download should be able to do.

What's the reason to want this? Note that Base.download never uses the server-provided file name for anything. It either uses a temporary file name or a caller-provided file name. If you use a server-provided file name, then there's the potential of name collision if the server happens to give two of the same file name. If you want all the downloads in one place, you can do this:

d = mktempdir()
file1 = download(url1, mktemp(d)[1])
file2 = download(url2, mktemp(d)[1])

Now all the files are in d.

@StefanKarpinski
Copy link
Member

It does occur to me that using mktemp(d)[1] here is pretty annoying. It would be better if tempname() took an optional parent directory argument like mktemp/dir, then you could write this as this:

d = mktempdir()
file1 = download(url1, tempname(d))
file2 = download(url2, tempname(d))

@StefanKarpinski
Copy link
Member

I've made a pull request to improve tempname this way: #33090.

@oxinabox
Copy link
Contributor Author

It does occur to me that using mktemp(d)[1] here is pretty annoying. It would be better if tempname() took an optional parent directory argument like mktemp/dir, then you could write this as this:

For interest that change is in this PR also

@StefanKarpinski
Copy link
Member

Changes to public functions like that should not be snuck with other changes 😐

@oxinabox
Copy link
Contributor Author

oxinabox commented Aug 27, 2019

Changes to public functions like that should not be snuck with other changes 😐

Yes, indeed
I didn't know tempname was exported.
But now I do

@oxinabox
Copy link
Contributor Author

oxinabox commented Aug 27, 2019

downloading into a user provided directory, and using the server provided filename is something that download should be able to do.

What's the reason to want this? Note that Base.download never uses the server-provided file name for anything. It either uses a temporary file name or a caller-provided file name. If you use a server-provided file name, then there's the potential of name collision if the server happens to give two of the same file name.

The short-answer is
Getting the file-extension right is important.
This came up recently JuliaIO/FileIO.jl#89

I was thinking in case of name collision we would do a extension preserving rename.
e.g.
on collision foo.csv.gz would become foo1.csv.gz which would become foo2.csv.gz etc.
Not fully fleshes this idea out, since it was a future PR.

@KristofferC
Copy link
Member

Need to be moved to Downloads.jl if you are still interested in moving forward with this.

@KristofferC KristofferC closed this Jun 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants