Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Providing a datasets.download_test_data() method #1111

Closed
maxrjones opened this issue Mar 23, 2021 · 2 comments · Fixed by #1310
Closed

Providing a datasets.download_test_data() method #1111

maxrjones opened this issue Mar 23, 2021 · 2 comments · Fixed by #1310
Labels
maintenance Boring but important stuff for the core devs
Milestone

Comments

@maxrjones
Copy link
Member

maxrjones commented Mar 23, 2021

Description of the desired feature

Tests are more reliable if the remote datasets are downloaded before running the tests, in particular if any are run in parallel. Developers only really need to worry about this once, but it still could be confusing for new contributors if tests fail because of the remote file mechanism. On the GMT-side gmt get -Dcache and gmt get -Ddata=earth -I1m -N can be used to download the data required for tests, but PyGMT does not need all these data for its tests.

Is it worth adding a new method datasets.download_test_data() that new contributors can use before testing their code to download all the necessary datasets? I think this would contain a PyGMT version of the contents of the 'Download remote data' task in the cache data workflow - then perhaps that workflow could use this method to avoid redundancy.

Are you willing to help implement and maintain this feature? Yes

@maxrjones maxrjones added the question Further information is requested label Mar 23, 2021
@seisman
Copy link
Member

seisman commented Mar 28, 2021

After we finish the migration of the baseline images to DVC, the source tarball and binary wheels won't contain baseline images. It means that users who install PyGMT via pip or conda cannot run tests.

I'm thinking if we should have a pygmt/testing.py module. The module can provide two functions:

  1. pygmt.testing.download_baseline_images(): runs dvc init, dvc remote to setup the remote dvc repository, and dvc pull to download the baseline images
  2. pygmt.testing.download_test_data(): downloads the data used in tests.

@weiji14 weiji14 added feature request New feature wanted and removed question Further information is requested labels Apr 6, 2021
@seisman
Copy link
Member

seisman commented Apr 10, 2021

  1. pygmt.testing.download_baseline_images(): runs dvc init, dvc remote to setup the remote dvc repository, and dvc pull to download the baseline images

This is not a good idea. To download the baseline images, the function must change the directory into the PyGMT installation path, runs the dvc commands to set up the dvc repository and download images. Such processes changes the pygmt installation directory (e.g., lib/python3.8/site-packages/pygmt directory). When uninstall pygmt, these files are not deleted by default, which may cause trouble for future pygmt installations.

@seisman seisman added this to the 0.4.0 milestone Jun 2, 2021
@seisman seisman added maintenance Boring but important stuff for the core devs and removed feature request New feature wanted labels Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maintenance Boring but important stuff for the core devs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants