Difficuty accessing ECCO file #179

waywardpidgeon · 2024-09-11T03:10:48Z

On trying the example generate_surface_fluxes.jl at the line

mask=ecco_mask()
I get the error (in this minimal example)
run(wget --http-user=kevin_broughan --http-passwd=Nasa44earth-data 'https://ecco.jpl.nasa.gov/drive/files/ECCO2/cube92_latlon_quart_90S90N/daily/THETA\THETA.1440x720x50.19930101.nc')
ERROR: IOError: could not spawn wget --http-user=kevin_broughan --http-passwd=Nasa44earth-data 'https://ecco.jpl.nasa.gov/drive/files/ECCO2/cube92_latlon_quart_90S90N/daily/THETA\THETA.1440x720x50.19930101.nc': no such file or directory (ENOENT)
However, when I login to Nasa earth data and navigate to the file the file its in place at 74.5Mb.

I tried running Julia as su with no change in other problematic examples.

Thanks for looking at this -- Kevin

The text was updated successfully, but these errors were encountered:

glwagner · 2024-09-11T19:58:49Z

Hi @waywardpidgeon, apologies for the sparse documentation... we are working on it. This page (which is on a PR) explains how to set up your ECCO_USERNAME and ECCO_PASSWORD environment variables:

https://github.com/CliMA/ClimaOcean.jl/tree/glw/clean-up-ECCO/src/DataWrangling/ECCO

Note that the username and password are not your login, but rather the "programmatic API" credentials which are generated by Earthdata.

PS you may want to change your login password now, because it is visible here.

waywardpidgeon · 2024-09-11T23:46:11Z

Thanks. I've changed the login password and set up the API creds. However, I'm running under Windows (10 and 11) in the Windows power shell, and this may be the problem with wget (and most other windows commands) needing special treatment from a brief perusal of the web.

wgetERROR.txt

However, I checked the file argument for C:\Users\kab.julia\scratchspaces\0376089a-ecfe-4b0e-a64f-9c555d74d754\Bathymetry
and found that this directory exists but is empty. I have attached the full error report from Julia 1.10.5.

Cheers - Kevin

waywardpidgeon · 2024-09-12T00:03:32Z

Also, might we use HTTP.jl instead of run and wget to access and download these sorts of files? See

https://stackoverflow.com/questions/61862089/how-to-use-wget-with-julia

Cheers - Kevin

glwagner · 2024-09-12T12:33:34Z

Ok ok

Can you confirm this is the error? If you run

ecco_mask()

on Windows, you'll get the error

run(`wget --http-user=kevin_broughan --http-passwd=Nasa44earth-data 'https://ecco.jpl.nasa.gov/drive/files/ECCO2/cube92_latlon_quart_90S90N/daily/THETA\THETA.1440x720x50.19930101.nc'`)
ERROR: IOError: could not spawn wget --http-user=kevin_broughan --http-passwd=Nasa44earth-data 'https://ecco.jpl.nasa.gov/drive/files/ECCO2/cube92_latlon_quart_90S90N/daily/THETA\THETA.1440x720x50.19930101.nc': no such file or directory (ENOENT)

I think this is coming from here:

ClimaOcean.jl/src/DataWrangling/ecco_metadata.jl

Lines 167 to 169 in b3b44b7

    
           cmd = `wget --http-user=$(username) --http-passwd=$(password) $(fileurl)` 
        
           run(cmd)

@waywardpidgeon you can confirm if you paste more of the error.

@simone-silvestri why don't we use Base.download here?

glwagner · 2024-09-12T12:56:22Z

Hmm it also looks like we apparently would like to test this on Windows, but are not right now,

ClimaOcean.jl/.github/workflows/ci.yml

Lines 55 to 58 in b3b44b7

    
               TEST_GROUP: "downloading" 
        
               ECCO_USERNAME: ${{ secrets.ECCO_USERNAME }} # To download ECCO data from the podaac website 
        
               ECCO_PASSWORD: ${{ secrets.ECCO_PASSWORD }} # To download ECCO data from the podaac website 
        
           - uses: julia-actions/julia-processcoverage@v1

this only tests JRA55

https://github.com/CliMA/ClimaOcean.jl/blob/main/test/test_downloading.jl

waywardpidgeon · 2024-09-12T22:48:14Z

Yes, that looks v promising. HTTP.jl and maybe other options I think call wget. The doc julia-1.10.5.pdf on p985 indicates Downloads.download is a currently supported function, with Downloads.jl included in the standard library (not Base).

I've tested this on an HTML file from my website kevinbroughan.nz under Windows 10 and it works ok.

I'll try Greg's test next but note some of the JRA55 requests, e.g. in generate_atmos_dataset.jl, I needed to replace
time_indices = 1:1 by time_indices = Colon()
Given this change, the downloads for JRA55 files appeared to work without error. I'll report on this change once the more difficult wget issue is sorted.
Cheers - Kevin

waywardpidgeon · 2024-09-12T23:22:45Z

Hmm it also looks like we apparently would like to test this on Windows, but are not right now,

ClimaOcean.jl/.github/workflows/ci.yml

Lines 55 to 58 in b3b44b7

TEST_GROUP: "downloading"

ECCO_USERNAME: ${{ secrets.ECCO_USERNAME }} # To download ECCO data from the podaac website

ECCO_PASSWORD: ${{ secrets.ECCO_PASSWORD }} # To download ECCO data from the podaac website

- uses: julia-actions/julia-processcoverage@v1

this only tests JRA55

https://github.com/CliMA/ClimaOcean.jl/blob/main/test/test_downloading.jl

I ran this code and here is the output:
@testset "Availability of JRA55 data" begin
@info "Testing that we can download all the JRA55 data..."
for name in ClimaOcean.DataWrangling.JRA55.JRA55_variable_names
fts = ClimaOcean.JRA55.JRA55_field_time_series(name; time_indices=2:3)
end
end
[ Info: Testing that we can download all the JRA55 data...
Test Summary: | Time
Availability of JRA55 data | None 6m31.4s
Test.DefaultTestSet("Availability of JRA55 data", Any[], 0, false, false, true, 1.726181921246e9, 1.726182312606e9, false, "REPL[15]")

Does this mean no JRA55 data is available to my process ?

Thanks - Kevin

waywardpidgeon · 2024-09-12T23:47:54Z

I looked again at one of the original errors and observed the url had single, not double quotes. Also before the file name there was a backslash not forward, another gremlin. Given these two url fixes I tried Downloads.download(url) and it downloaded a file of over 470Mb into a temporary file with name returned by the function. So I assume it works for ECCO files.
There was no request for username or password - these are now in my user environment, so it might have picked them up automatically.
Cheers - Kevin
PS I'm not confident enough yet to make a pull request and attempt changes.

Sorry about Windows, but there must be many users who have it so I'm sticking with this OS.

glwagner · 2024-09-13T00:37:56Z

No need to apologies!

I think a more direct test will be to paste this code in your REPL

using ClimaOcean
using Oceananigans
JRA55_temperature = ClimaOcean.JRA55.JRA55_field_time_series(:temperature; time_indices=2:3)

Then you can try to play around with, inspect, or plot JRA55_temperature to see if indeed JRA55 data was downloaded.

PS: to make the errors easier to read when you paste them here, try surrounding them in triple backticks (```). This is how your format "code blocks" in markdown (also googling "code blocks in markdown" may help)

simone-silvestri · 2024-09-16T15:05:37Z

Sorry for the late reply. I forgot that wget does not work on windows machines. Good catch. If you make a PR to correct this oversight it would be great.

waywardpidgeon · 2024-09-16T23:12:23Z

I think an expert should fix this with a PR, not me yet.

Here is my workaround for wget producing an error in Windows: I used Downloads.download and the ECCO data for the example generated_bathymetry.jl ended up in my Downloads folder. I moved it into the scratchspaces subdirectory of .julia (the file was about 0.5 Gbytes), the subdirectory possibly having been created by the generate_atmos_dataset.jl example. I then ran the example generated_bathymetry.jl again and it picked up the file ("existing file") without any further difficulty (other than the warning "resolution->size" for the Figure Makie function).

simone-silvestri · 2024-09-16T23:24:35Z

I do not have immediate access to a windows machine. If I open a PR to fix this, can you check that downloading works? We should probably add Windows tests in the CI to catch these architecture-dependent bugs

waywardpidgeon · 2024-09-17T00:26:16Z

OK I will test.

Windows tests would be a good idea. Its nice to be alert to the problem - I think I have struck it in the past and not known it!

I am working on the generate_surface_fluxes.jl example, having downloaded the ECCO2 data directly from the JPL data store. Is there a path that I can find or establish on my machine where I can put this data and the program (ecco_mask() and its calls) will find it?
Thanks - Kevin

glwagner · 2024-09-17T17:54:47Z

@simone-silvestri we do have access to windows via github actions. We need to add a test there.

waywardpidgeon · 2024-09-17T18:01:07Z

Another case: in the example inspect_ecco_data.jl at line 20 we see a reference to the "default location' from which ecco_field is expected to access the data if it exists (see the doc for ecco_field in ECCO.jl item (1)). I want to place the data (which I have) at this location, but I don't know what it is? What is it and how in general can I find it?
Thanks - Kevin

glwagner · 2024-09-17T18:03:12Z

Can you put hyperlinks to the source code here?

waywardpidgeon · 2024-09-17T20:26:24Z

Is
https://github.com/CliMA/ClimaOcean.jl/examples/inspect_ecco_data.jl
at line 22 what you mean?

glwagner · 2024-09-18T02:46:03Z

Ah yes, it's also possible to generate a link to the line itself (which is nice because GitHub will display that line directly in the comment, convenient because we don't have to click on it). A link to the file is helpful too though, thank you.

To generate a link to a line look for the three dots that appear when hovering your mouse over the line numbers on the left side of a file being viewed on GitHub.

glwagner · 2024-09-18T02:49:02Z

@waywardpidgeon you mean these lines?

ClimaOcean.jl/examples/inspect_ecco_data.jl

Lines 19 to 23 in ddea939

    
           # The function `ecco_field` provided by `ClimaOcean.DataWrangling.ECCO` will automatically 
        
           # download ECCO data if it doesn't already exist at the default location. 
        
           T = ECCO.ecco_field(:temperature) 
        
           S = ECCO.ecco_field(:salinity)

glwagner · 2024-09-18T02:54:23Z

Ok ok I understand the question.

It looks like you can provide a filename:

ClimaOcean.jl/src/DataWrangling/ECCO.jl

Lines 133 to 142 in ddea939

    
           function ecco_field(metadata::ECCOMetadata; 
        
                               architecture = CPU(), 
        
                               horizontal_halo = (3, 3), 
        
                               filename = metadata_filename(metadata)) 
        
               shortname = short_name(metadata) 
        
               download_dataset!(metadata) 
        
               ds = Dataset(filename)

But it also seems that it will still try to download the dataset even if the filename is provided.

This seems really buggy and fragile but maybe there is an underlying purpose to this design that @simone-silvestri can explain.

Sorry to say @waywardpidgeon that this stuff is in flux a little bit and we need to clean it up for sure. From your experience I understand that:

We need to test auto-downloading on windows
We should allow users to pass data paths, and allow downloading to be skipped if the data exists at that custom path.

What else?

simone-silvestri · 2024-09-18T11:12:51Z

Yeahm the path cannot be specified, that in theory, was one of the objectives of #180.
download_dataset! checks if the file exists, and if it does not, it dowloads the file

ClimaOcean.jl/src/DataWrangling/ecco_metadata.jl

Lines 154 to 160 in ddea939

    
           if !isfile(filename) 
        
               isnothing(username) && throw(ArgumentError("Could not find the username for $(url). Please provide a username in the ECCO_USERNAME environment variable.")) 
        
               isnothing(password) && throw(ArgumentError("Could not find the username for $(url). Please provide a password in the ECCO_PASSWORD environment variable.")) 
        
               # Version specific download file url 
        
               if data.version isa ECCO2Monthly || data.version isa ECCO2Daily

.
So we should change the download command to downloads and add a path for the ecco files.
In theory, in #180 we provide a directory to the ECCOMetadata type that specifies the path of the files. That will help us with generalizing the data-wrangling module.

simone-silvestri · 2024-09-18T11:22:56Z

we can also probably remove the filename keyword argument. The user should not be able to choose the filename, I think.

glwagner · 2024-09-18T17:54:34Z

I think we should polish this implementation first and then we can add new features, eg supporting manually-downloaded files for example.

But yeah in general it is good if everyone uses the same filename, it will help communication.

waywardpidgeon · 2024-09-18T22:07:55Z

OK, I'll stop with this work now for a while. Plenty else to do. Here is where I'm at. The function urlread works under Windows 11

using HTTP
using Downloads: download

function urlread(url, un, output)
pw =HTTP.escapeuri(ENV["ECCO_PASSWORD"])
cred_url="https://$un:$pw@" * replace(url, r"^.*://"=>"")
download(cred_url, output)
end
url="http://ecco.jpl.nasa.gov/drive/files/ECCO2/cube92_latlon_quart_90S90N/daily/THETA/THETA.1440x720x50.19930101.nc"
un="your_ecco_name"
output= "path_including_file"

The download function did not pick up the username or password from the environment and It did not appear to read these with .netrc or _netrc, but building in both API credentials into the URL worked (picked up the method from SpaceLiDAR.jl). This function returned a path and created the file but seemed to need existing subdirectories to actually work.

Other tasks: getting the output into Julia, including the Sys.iswindows() conditional, creating the canonical subdirectories of .julia/scratchspaces, doing a good check of other wget occurrences, etc.

Thanks for the advice. I try and come more up to speed with git next time - Kevin
PS my git password should change to broughan.kevin@gmail.com soon.

simone-silvestri linked a pull request Sep 18, 2024 that will close this issue

Add a download test on windows #183

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difficuty accessing ECCO file #179

Difficuty accessing ECCO file #179

waywardpidgeon commented Sep 11, 2024

glwagner commented Sep 11, 2024

waywardpidgeon commented Sep 11, 2024

waywardpidgeon commented Sep 12, 2024

glwagner commented Sep 12, 2024

glwagner commented Sep 12, 2024

waywardpidgeon commented Sep 12, 2024

waywardpidgeon commented Sep 12, 2024

waywardpidgeon commented Sep 12, 2024

glwagner commented Sep 13, 2024

simone-silvestri commented Sep 16, 2024

waywardpidgeon commented Sep 16, 2024

simone-silvestri commented Sep 16, 2024

waywardpidgeon commented Sep 17, 2024

glwagner commented Sep 17, 2024

waywardpidgeon commented Sep 17, 2024

glwagner commented Sep 17, 2024

waywardpidgeon commented Sep 17, 2024

glwagner commented Sep 18, 2024 •

edited

Loading

glwagner commented Sep 18, 2024

glwagner commented Sep 18, 2024

simone-silvestri commented Sep 18, 2024 •

edited

Loading

simone-silvestri commented Sep 18, 2024 •

edited

Loading

glwagner commented Sep 18, 2024

waywardpidgeon commented Sep 18, 2024

Difficuty accessing ECCO file #179

Difficuty accessing ECCO file #179

Comments

waywardpidgeon commented Sep 11, 2024

glwagner commented Sep 11, 2024

waywardpidgeon commented Sep 11, 2024

waywardpidgeon commented Sep 12, 2024

glwagner commented Sep 12, 2024

glwagner commented Sep 12, 2024

waywardpidgeon commented Sep 12, 2024

waywardpidgeon commented Sep 12, 2024

waywardpidgeon commented Sep 12, 2024

glwagner commented Sep 13, 2024

simone-silvestri commented Sep 16, 2024

waywardpidgeon commented Sep 16, 2024

simone-silvestri commented Sep 16, 2024

waywardpidgeon commented Sep 17, 2024

glwagner commented Sep 17, 2024

waywardpidgeon commented Sep 17, 2024

glwagner commented Sep 17, 2024

waywardpidgeon commented Sep 17, 2024

glwagner commented Sep 18, 2024 • edited Loading

glwagner commented Sep 18, 2024

glwagner commented Sep 18, 2024

simone-silvestri commented Sep 18, 2024 • edited Loading

simone-silvestri commented Sep 18, 2024 • edited Loading

glwagner commented Sep 18, 2024

waywardpidgeon commented Sep 18, 2024

glwagner commented Sep 18, 2024 •

edited

Loading

simone-silvestri commented Sep 18, 2024 •

edited

Loading

simone-silvestri commented Sep 18, 2024 •

edited

Loading