Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version datasets individually #18

Merged
merged 9 commits into from
Feb 18, 2022
Merged

Version datasets individually #18

merged 9 commits into from
Feb 18, 2022

Conversation

leouieda
Copy link
Member

Instead of using the entire data bundle and versioning all datasets based on the module name, version them individually. To do so, pull from the release and DOIs in https://github.com/fatiando-data instead of https://github.com/fatiando/data This means that functions won't have to be repeated and updating 1 dataset doesn't mean copying all of the others along with it (since the collection would be new). Versions are now specified as a required version argument in all fetch_* functions.

The implementation still bundles information for all v1 datasets in a variable to make it easier to manage. When updating a dataset, entries for the others don't need to be repeated. Also only need 2 environment variables for setting the cache location and the data source (instead of 2 per version). A downside is that we can no longer accept a variable to set custom data source URL since each dataset has a different one. The new environment variable only sets fetching from GitHub or not.

Reminders:

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst and the base __init__.py file for the package.
  • Write detailed docstrings for all functions/classes/methods. It often helps to design better code if you write the docstrings first.
  • If adding new functionality, add an example to the docstring, gallery, and/or tutorials.
  • Add your full name, affiliation, and ORCID (optional) to the AUTHORS.md file (if you haven't already) in case you'd like to be listed as an author on the Zenodo archive of the next release.

@leouieda leouieda marked this pull request as ready for review January 25, 2022 16:24
@leouieda leouieda requested a review from santisoler January 25, 2022 16:27
@leouieda
Copy link
Member Author

@santisoler could you take a quick look at ensaio/_fetchers.py? All the rest is updates to docs and moving things around so not much changed there.

The implementation I went for is to centralize the URLs, hashes, and DOIs in a single REGISTRY dictionary and have a function that generates a pooch.Pooch from it. I find it a bit clunky right now since Pooch is designed to fetch from single source mostly. Not sure if it would be best to put the information inside the fetch_ functions and wrap pooch.retrieve (which would mean reimplementing the cache environment variable code here).

Any thoughts would be welcome. Not crucial for merging this since it's all backend stuff that we can change later.

The registry now groups by data instead of version, which is more
intuitive and keeps different versions of the same data close to each
other. Had to change the implementation a bit but nothing major.
@leouieda leouieda merged commit 87db21f into main Feb 18, 2022
@leouieda leouieda deleted the new-source branch February 18, 2022 16:04
@leouieda
Copy link
Member Author

Merging this in since @santisoler reviewed and gave feedback live in a developer meeting (see the meeting notes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant