Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficuty accessing ECCO file #179

Open
waywardpidgeon opened this issue Sep 11, 2024 · 24 comments · May be fixed by #183
Open

Difficuty accessing ECCO file #179

waywardpidgeon opened this issue Sep 11, 2024 · 24 comments · May be fixed by #183

Comments

@waywardpidgeon
Copy link

On trying the example generate_surface_fluxes.jl at the line

mask=ecco_mask()
I get the error (in this minimal example)
run(wget --http-user=kevin_broughan --http-passwd=Nasa44earth-data 'https://ecco.jpl.nasa.gov/drive/files/ECCO2/cube92_latlon_quart_90S90N/daily/THETA\THETA.1440x720x50.19930101.nc')
ERROR: IOError: could not spawn wget --http-user=kevin_broughan --http-passwd=Nasa44earth-data 'https://ecco.jpl.nasa.gov/drive/files/ECCO2/cube92_latlon_quart_90S90N/daily/THETA\THETA.1440x720x50.19930101.nc': no such file or directory (ENOENT)
However, when I login to Nasa earth data and navigate to the file the file its in place at 74.5Mb.

I tried running Julia as su with no change in other problematic examples.

Thanks for looking at this -- Kevin

@glwagner
Copy link
Member

Hi @waywardpidgeon, apologies for the sparse documentation... we are working on it. This page (which is on a PR) explains how to set up your ECCO_USERNAME and ECCO_PASSWORD environment variables:

https://github.com/CliMA/ClimaOcean.jl/tree/glw/clean-up-ECCO/src/DataWrangling/ECCO

Note that the username and password are not your login, but rather the "programmatic API" credentials which are generated by Earthdata.

PS you may want to change your login password now, because it is visible here.

@waywardpidgeon
Copy link
Author

Thanks. I've changed the login password and set up the API creds. However, I'm running under Windows (10 and 11) in the Windows power shell, and this may be the problem with wget (and most other windows commands) needing special treatment from a brief perusal of the web.

wgetERROR.txt

However, I checked the file argument for C:\Users\kab.julia\scratchspaces\0376089a-ecfe-4b0e-a64f-9c555d74d754\Bathymetry
and found that this directory exists but is empty. I have attached the full error report from Julia 1.10.5.

Cheers - Kevin

@waywardpidgeon
Copy link
Author

Also, might we use HTTP.jl instead of run and wget to access and download these sorts of files? See

https://stackoverflow.com/questions/61862089/how-to-use-wget-with-julia

Cheers - Kevin

@glwagner
Copy link
Member

Ok ok

Can you confirm this is the error? If you run

ecco_mask()

on Windows, you'll get the error

run(`wget --http-user=kevin_broughan --http-passwd=Nasa44earth-data 'https://ecco.jpl.nasa.gov/drive/files/ECCO2/cube92_latlon_quart_90S90N/daily/THETA\THETA.1440x720x50.19930101.nc'`)
ERROR: IOError: could not spawn wget --http-user=kevin_broughan --http-passwd=Nasa44earth-data 'https://ecco.jpl.nasa.gov/drive/files/ECCO2/cube92_latlon_quart_90S90N/daily/THETA\THETA.1440x720x50.19930101.nc': no such file or directory (ENOENT)

I think this is coming from here:

cmd = `wget --http-user=$(username) --http-passwd=$(password) $(fileurl)`
run(cmd)

@waywardpidgeon you can confirm if you paste more of the error.

@simone-silvestri why don't we use Base.download here?

@glwagner
Copy link
Member

Hmm it also looks like we apparently would like to test this on Windows, but are not right now,

TEST_GROUP: "downloading"
ECCO_USERNAME: ${{ secrets.ECCO_USERNAME }} # To download ECCO data from the podaac website
ECCO_PASSWORD: ${{ secrets.ECCO_PASSWORD }} # To download ECCO data from the podaac website
- uses: julia-actions/julia-processcoverage@v1

this only tests JRA55

https://github.com/CliMA/ClimaOcean.jl/blob/main/test/test_downloading.jl

@waywardpidgeon
Copy link
Author

Yes, that looks v promising. HTTP.jl and maybe other options I think call wget. The doc julia-1.10.5.pdf on p985 indicates Downloads.download is a currently supported function, with Downloads.jl included in the standard library (not Base).

I've tested this on an HTML file from my website kevinbroughan.nz under Windows 10 and it works ok.

I'll try Greg's test next but note some of the JRA55 requests, e.g. in generate_atmos_dataset.jl, I needed to replace
time_indices = 1:1 by time_indices = Colon()
Given this change, the downloads for JRA55 files appeared to work without error. I'll report on this change once the more difficult wget issue is sorted.
Cheers - Kevin

@waywardpidgeon
Copy link
Author

Hmm it also looks like we apparently would like to test this on Windows, but are not right now,

TEST_GROUP: "downloading"
ECCO_USERNAME: ${{ secrets.ECCO_USERNAME }} # To download ECCO data from the podaac website
ECCO_PASSWORD: ${{ secrets.ECCO_PASSWORD }} # To download ECCO data from the podaac website
- uses: julia-actions/julia-processcoverage@v1

this only tests JRA55

https://github.com/CliMA/ClimaOcean.jl/blob/main/test/test_downloading.jl

I ran this code and here is the output:
@testset "Availability of JRA55 data" begin
@info "Testing that we can download all the JRA55 data..."
for name in ClimaOcean.DataWrangling.JRA55.JRA55_variable_names
fts = ClimaOcean.JRA55.JRA55_field_time_series(name; time_indices=2:3)
end
end
[ Info: Testing that we can download all the JRA55 data...
Test Summary: | Time
Availability of JRA55 data | None 6m31.4s
Test.DefaultTestSet("Availability of JRA55 data", Any[], 0, false, false, true, 1.726181921246e9, 1.726182312606e9, false, "REPL[15]")

Does this mean no JRA55 data is available to my process ?

Thanks - Kevin

@waywardpidgeon
Copy link
Author

I looked again at one of the original errors and observed the url had single, not double quotes. Also before the file name there was a backslash not forward, another gremlin. Given these two url fixes I tried Downloads.download(url) and it downloaded a file of over 470Mb into a temporary file with name returned by the function. So I assume it works for ECCO files.
There was no request for username or password - these are now in my user environment, so it might have picked them up automatically.
Cheers - Kevin
PS I'm not confident enough yet to make a pull request and attempt changes.

Sorry about Windows, but there must be many users who have it so I'm sticking with this OS.

@glwagner
Copy link
Member

No need to apologies!

I think a more direct test will be to paste this code in your REPL

using ClimaOcean
using Oceananigans
JRA55_temperature = ClimaOcean.JRA55.JRA55_field_time_series(:temperature; time_indices=2:3)

Then you can try to play around with, inspect, or plot JRA55_temperature to see if indeed JRA55 data was downloaded.

PS: to make the errors easier to read when you paste them here, try surrounding them in triple backticks (```). This is how your format "code blocks" in markdown (also googling "code blocks in markdown" may help)

@simone-silvestri
Copy link
Collaborator

Sorry for the late reply. I forgot that wget does not work on windows machines. Good catch. If you make a PR to correct this oversight it would be great.

@waywardpidgeon
Copy link
Author

I think an expert should fix this with a PR, not me yet.

Here is my workaround for wget producing an error in Windows: I used Downloads.download and the ECCO data for the example generated_bathymetry.jl ended up in my Downloads folder. I moved it into the scratchspaces subdirectory of .julia (the file was about 0.5 Gbytes), the subdirectory possibly having been created by the generate_atmos_dataset.jl example. I then ran the example generated_bathymetry.jl again and it picked up the file ("existing file") without any further difficulty (other than the warning "resolution->size" for the Figure Makie function).

@simone-silvestri
Copy link
Collaborator

I do not have immediate access to a windows machine. If I open a PR to fix this, can you check that downloading works? We should probably add Windows tests in the CI to catch these architecture-dependent bugs

@waywardpidgeon
Copy link
Author

OK I will test.

Windows tests would be a good idea. Its nice to be alert to the problem - I think I have struck it in the past and not known it!

I am working on the generate_surface_fluxes.jl example, having downloaded the ECCO2 data directly from the JPL data store. Is there a path that I can find or establish on my machine where I can put this data and the program (ecco_mask() and its calls) will find it?
Thanks - Kevin

@glwagner
Copy link
Member

@simone-silvestri we do have access to windows via github actions. We need to add a test there.

@waywardpidgeon
Copy link
Author

Another case: in the example inspect_ecco_data.jl at line 20 we see a reference to the "default location' from which ecco_field is expected to access the data if it exists (see the doc for ecco_field in ECCO.jl item (1)). I want to place the data (which I have) at this location, but I don't know what it is? What is it and how in general can I find it?
Thanks - Kevin

@glwagner
Copy link
Member

Can you put hyperlinks to the source code here?

@waywardpidgeon
Copy link
Author

@glwagner
Copy link
Member

glwagner commented Sep 18, 2024

Ah yes, it's also possible to generate a link to the line itself (which is nice because GitHub will display that line directly in the comment, convenient because we don't have to click on it). A link to the file is helpful too though, thank you.

To generate a link to a line look for the three dots that appear when hovering your mouse over the line numbers on the left side of a file being viewed on GitHub.

@glwagner
Copy link
Member

@waywardpidgeon you mean these lines?

# The function `ecco_field` provided by `ClimaOcean.DataWrangling.ECCO` will automatically
# download ECCO data if it doesn't already exist at the default location.
T = ECCO.ecco_field(:temperature)
S = ECCO.ecco_field(:salinity)

@glwagner
Copy link
Member

Ok ok I understand the question.

It looks like you can provide a filename:

function ecco_field(metadata::ECCOMetadata;
architecture = CPU(),
horizontal_halo = (3, 3),
filename = metadata_filename(metadata))
shortname = short_name(metadata)
download_dataset!(metadata)
ds = Dataset(filename)

But it also seems that it will still try to download the dataset even if the filename is provided.

This seems really buggy and fragile but maybe there is an underlying purpose to this design that @simone-silvestri can explain.

Sorry to say @waywardpidgeon that this stuff is in flux a little bit and we need to clean it up for sure. From your experience I understand that:

  1. We need to test auto-downloading on windows
  2. We should allow users to pass data paths, and allow downloading to be skipped if the data exists at that custom path.

What else?

@simone-silvestri
Copy link
Collaborator

simone-silvestri commented Sep 18, 2024

Yeahm the path cannot be specified, that in theory, was one of the objectives of #180.
download_dataset! checks if the file exists, and if it does not, it dowloads the file

if !isfile(filename)
isnothing(username) && throw(ArgumentError("Could not find the username for $(url). Please provide a username in the ECCO_USERNAME environment variable."))
isnothing(password) && throw(ArgumentError("Could not find the username for $(url). Please provide a password in the ECCO_PASSWORD environment variable."))
# Version specific download file url
if data.version isa ECCO2Monthly || data.version isa ECCO2Daily
.
So we should change the download command to downloads and add a path for the ecco files.
In theory, in #180 we provide a directory to the ECCOMetadata type that specifies the path of the files. That will help us with generalizing the data-wrangling module.

@simone-silvestri
Copy link
Collaborator

simone-silvestri commented Sep 18, 2024

we can also probably remove the filename keyword argument. The user should not be able to choose the filename, I think.

@simone-silvestri simone-silvestri linked a pull request Sep 18, 2024 that will close this issue
@glwagner
Copy link
Member

I think we should polish this implementation first and then we can add new features, eg supporting manually-downloaded files for example.

But yeah in general it is good if everyone uses the same filename, it will help communication.

@waywardpidgeon
Copy link
Author

OK, I'll stop with this work now for a while. Plenty else to do. Here is where I'm at. The function urlread works under Windows 11

using HTTP
using Downloads: download

function urlread(url, un, output)
pw =HTTP.escapeuri(ENV["ECCO_PASSWORD"])
cred_url="https://$un:$pw@" * replace(url, r"^.*://"=>"")
download(cred_url, output)
end
url="http://ecco.jpl.nasa.gov/drive/files/ECCO2/cube92_latlon_quart_90S90N/daily/THETA/THETA.1440x720x50.19930101.nc"
un="your_ecco_name"
output= "path_including_file"

The download function did not pick up the username or password from the environment and It did not appear to read these with .netrc or _netrc, but building in both API credentials into the URL worked (picked up the method from SpaceLiDAR.jl). This function returned a path and created the file but seemed to need existing subdirectories to actually work.

Other tasks: getting the output into Julia, including the Sys.iswindows() conditional, creating the canonical subdirectories of .julia/scratchspaces, doing a good check of other wget occurrences, etc.

Thanks for the advice. I try and come more up to speed with git next time - Kevin
PS my git password should change to broughan.kevin@gmail.com soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants