Sync testdata folder to THREDDS testdata/raven #185

huard · 2019-12-06T13:18:02Z

So we can run tests entirely on the platform.

huard · 2020-08-26T20:16:48Z

Related: Ouranosinc/xclim#525
We could use the same strategy of creating a stand-alone repo for test data. Might make the sync process cleaner.

tlvu · 2020-09-21T19:15:49Z

I am trying to wrap my head around what's the real root problem here.

Basically we do not want tests to use relative path to the testdata folder in this repo. How about just turn that relative path to an http path directly to github (ex: https://github.com/Ouranosinc/raven/raw/master/tests/testdata/hydro_simulations/raven-gr4j-cemaneige-sim_gr4jcn-0_Hydrographs.nc)? This way, no other server is needed and it makes the .ipynb file standalone?

For optimization, we can detect if the relative path exist (dev full checkout mode), use that one and keep the http path as fallback only (tutorial mode).

I rather our Thredds server do not become a single point of failure for running tests.

huard · 2020-09-21T19:50:58Z

I think we want to split the test data from the code. This would mean creating a raven-testdata repo.
Some of the tests and tutorials should exercise DAP URLs, so for some tests, we'll need a THREDDS server.

One idea is to have a github raven-testdata repo that gets synced on THREDDS when there is a release of raven-testdata. Then, we'll need a client API to fetch data either from THREDDS or from github.

tlvu · 2020-09-21T20:27:36Z

2. Some of the tests and tutorials should exercise DAP URLs, so for some tests, we'll need a THREDDS server.

Maybe Raven spawn its own Thredds server, like Emu does https://github.com/bird-house/emu/blob/56def2684fc28fee09089382de192075f065f3f2/docker-compose.yml#L12-L23 ?

Again I don't want all the tests to suddenly fail on Travis-CI and all local dev workstation just because Thredds is down for maintenance. And let's say there is a new or updated dataset that is not yet synced to Thredds, how can a dev continue his work?

So my point is yes we'll still need to sync all the data to Thredds for tutorials but day-to-day dev workflow should not rely on Thredds.

Which test(s) need a DAP link right now? Some data is already manually on Thredds?

tlvu · 2020-09-21T20:36:45Z

Can Intake provide what we need "abstract the backend storage (local file, http link, dap link)"?

huard · 2020-09-21T20:45:43Z

I'm wary to make the development environment more complex than it is, but I think you raise valid issues.
I agree on the need to split the test environment (stand-alone) and the tutorial environment (connected to existing data on THREDDS).

Intake: possibly. There could be a field access_type taking values of http or dap that we could filter on. Or two different catalogs.

tlvu · 2020-10-15T03:05:01Z

One idea is to have a github raven-testdata repo that gets synced on THREDDS when there is a release of raven-testdata. Then, we'll need a client API to fetch data either from THREDDS or from github.

Analysis:
Ouranosinc/xclim-testdata#1 (comment)

Conclusion
Ouranosinc/xclim-testdata#1 (comment)

tlvu · 2020-10-15T15:27:49Z

@huard Houston, we have a problem.

Looking at

raven/docs/source/notebooks/example_data.py

Lines 31 to 33 in 58e8978

    
           TESTDATA["raven-mohyse-rv"] = tuple( 
        
               (TD / "raven-mohyse").glob("raven-mohyse-salmon.rv?") 
        
           )

There are many other file types than .nc so synching .nc files to Thredds will not solve the entire problem. Unless you tell me Thredds can also handle .rvt, .gml, .zip, .gpkg, .tiff. .csv files.

I see 2 possible solutions to avoid having to clone the entire Raven repo for tutorial notebooks:

1 - sync only the testdata folder together with the tutorial notebooks, so we avoid synching the entire repo, and in that example_data.py file we add a fallback, "if not available at the usual location, search for a folder raven-testdata in the same folder".

2 - add a fallback to direct http raw file on github (ex: https://github.com/Ouranosinc/raven/raw/master/tests/testdata/gr4j_cemaneige/evap.nc). This route will imply hardcoding each and every testdata file since the glob trick on local filesystem do not work anymore and also means no deletion or modifying existing testdata, else old revisions of the notebooks would break. The upside to this option is each .ipynb will only need example_data.py next to it, not the entire raven-testdata/ folder. But it is still not 100% standalone. To be 100% standalone, we need to duplicate the logic of example_data.py inside each .ipynb file, not sure it's a good idea either but it's an option if we really want 100% standalone .ipynb files.

I would favor option 1 unless you have a 3rd option to suggestion or you prefer option 2 and can live with the limitations.

huard · 2020-10-15T15:41:43Z

Suggestion:

Sync only netCDF files to THREDDS.
Create a function raven.tutorial.get_file that knows how to handle different file types. In the case of netCDF either returns an http or dap link from thredds, and for other files return the rawgithub link.

tlvu · 2020-10-15T16:18:24Z

Sync only netCDF files to THREDD

Already done.

Create a function raven.tutorial.get_file that knows how to handle different file types. In the case of netCDF either returns an http or dap link from thredds, and for other files return the rawgithub link.

Just to be sure, this is a fallback only when the full checkout is not there. On dev workstation and Travis-CI, the full checkout will be there so I'd rather not force external dependencies when everything is available locally. I don't want dev unable to run tests and Travis-CI fail just because our Thredds is on maintenance mode.

Where would you want this raven.tutorial.get_file function? It will ship part of Raven? If a new testdata file is added or existing renamed, we will need to release a new Raven? Or in the same old example_data.py and ship that example_data.py together with all the tutorial notebooks?

huard · 2020-10-15T17:15:43Z

Yes, as part of Raven. I think this matches the philosophy of keeping the notebooks in sync with the code.
Yes. I don't think there is a use case for new test data file that is not explicitly used by the code.

The example_data model is really not ideal I believe. I think it's a source of user confusion (answered a question about it today...)

tlvu · 2020-10-15T17:37:19Z

Yes, as part of Raven. I think this matches the philosophy of keeping the notebooks in sync with the code.

Perfect, as long as we are ready to release and deploy Raven often. The notebooks are set to auto-deploy every hour. If it requires a new Raven for new testdata, it will break.

Yes. I don't think there is a use case for new test data file that is not explicitly used by the code.

I didn't mean new testdata not used. I meant new testdata needed by notebook but not yet available on the currently deployed Raven (not up-to-date yet Raven). Not a problem if we release Raven often for deployment.

huard assigned tlvu Dec 6, 2019

tlvu mentioned this issue Jun 10, 2020

Make notebooks more standalone #278

Closed

tlvu mentioned this issue Sep 21, 2020

Clean-up test and doctests datasets Ouranosinc/xclim#525

Closed

tlvu mentioned this issue Oct 15, 2020

Sync Raven testdata to Thredds for Raven tutorial notebooks bird-house/birdhouse-deploy#72

Merged

tlvu mentioned this issue Nov 24, 2020

Create a raven-testdata repo #318

Closed

Zeitsperre mentioned this issue Nov 26, 2020

First pass at a get_file() function #322

Merged

huard closed this as completed Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync testdata folder to THREDDS testdata/raven #185

Sync testdata folder to THREDDS testdata/raven #185

huard commented Dec 6, 2019

huard commented Aug 26, 2020

tlvu commented Sep 21, 2020

huard commented Sep 21, 2020

tlvu commented Sep 21, 2020

tlvu commented Sep 21, 2020

huard commented Sep 21, 2020

tlvu commented Oct 15, 2020

tlvu commented Oct 15, 2020

huard commented Oct 15, 2020

tlvu commented Oct 15, 2020

huard commented Oct 15, 2020

tlvu commented Oct 15, 2020

Sync testdata folder to THREDDS testdata/raven #185

Sync testdata folder to THREDDS testdata/raven #185

Comments

huard commented Dec 6, 2019

huard commented Aug 26, 2020

tlvu commented Sep 21, 2020

huard commented Sep 21, 2020

tlvu commented Sep 21, 2020

tlvu commented Sep 21, 2020

huard commented Sep 21, 2020

tlvu commented Oct 15, 2020

tlvu commented Oct 15, 2020

huard commented Oct 15, 2020

tlvu commented Oct 15, 2020

huard commented Oct 15, 2020

tlvu commented Oct 15, 2020