Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use pooch.os_cache and pkg_resources in datasets #220

Merged
merged 11 commits into from
Jan 20, 2020
10 changes: 6 additions & 4 deletions .azure-pipelines.yml
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@ jobs:
CONDA_REQUIREMENTS: requirements.txt
CONDA_REQUIREMENTS_DEV: requirements-dev.txt
CONDA_INSTALL_EXTRA: "codecov"
VERDE_DATA_DIR: "$HOME/.verde/data/master"

strategy:
matrix:
Expand Down Expand Up @@ -127,8 +128,8 @@ jobs:
# Copy the test data to the cache folder
- bash: |
set -x -e
mkdir -p $HOME/.verde/data/master
cp -r data/* $HOME/.verde/data/master
mkdir -p $VERDE_DATA_DIR
cp -r data/* $VERDE_DATA_DIR
displayName: Copy test data to cache

# Install the package
Expand Down Expand Up @@ -178,6 +179,7 @@ jobs:
CONDA_REQUIREMENTS: requirements.txt
CONDA_REQUIREMENTS_DEV: requirements-dev.txt
CONDA_INSTALL_EXTRA: "codecov"
VERDE_DATA_DIR: "~/.verde/data/master"

strategy:
matrix:
Expand Down Expand Up @@ -222,8 +224,8 @@ jobs:
# Copy the test data to the cache folder
- bash: |
set -x -e
mkdir -p ~/.verde/data/master
cp -r data/* ~/.verde/data/master
mkdir -p $VERDE_DATA_DIR
cp -r data/* $VERDE_DATA_DIR
displayName: Copy test data to cache

# Install the package that we want to test
Expand Down
5 changes: 3 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ env:
# PyPI password for deploying releases (TWINE_PASSWORD)
- secure: "Gvd2kH5bGIng7Wz3R4Md5d48qU0vYo0Sb4g7A1UAn8EOuWAcbkdSAq5yDiAp4pENeGceHQG0+jX+GQBZoSOMUpwAfhPkWG5HBIc+P/G+iTUyF2oELLCekcGgccPzwNgQt574FzM0PkC9L4hINNRjVtnFa+SIx72D2r1OdTvmk2+c4jXBZl52e4l5dU+Hjzwh22KNzAMtXDVuvr3NVdJZHA/ldTwEBUQfiLo2CGkgls6o8ZLixK0tCRGIFKlZko9WeBTzQYidloSo3EQx0eqiTz7qydm3UfCezA9UYPefGOtUaA/4ysqs8tgG8xrnx8NhhRqH9pfPAhgsCMwfmtibslNwH+C7gtbERT8lLY5NfU1xyDC4UxkjbwbzKQno/vPhiqEJ/uR458IdZbzUeWXlt+Rz+Dyj1lW7FqPLOl3Zpfgfv1swWqxjVwduV46c3nlgu9fEkAiEH2SzAtBlsQ2qwbJCZKXj+8Ps9FmaqvQ+SCOTAycgR9WnYoIIutpn0cs3k8zqqQyBq2zXJLkPHflVich8wKKaOsaFMCIKLWaOODCw5fLkfxck/QtlolGGFi3lh5W5p4Zxxr7KdL8f+UrkAb6gY9LStvqwe2rSG2olqc95+zozsMY/YHXTIG092WB3EmptwO9jL67D3AIVBKOdvcRYFetWMyY61ZmEK0s/43I="
- TWINE_USERNAME=Leonardo.Uieda
- VERDE_DATA_DIR="$HOME/.verde/data/master"
# The file with the listed requirements to be installed by conda
- CONDA_REQUIREMENTS=requirements.txt
- CONDA_REQUIREMENTS_DEV=requirements-dev.txt
Expand Down Expand Up @@ -65,8 +66,8 @@ matrix:
# Setup the build environment
before_install:
# Copy sample data to the verde data dir to avoid downloading all the time
- mkdir -p $HOME/.verde/data/master
- cp -r data/* $HOME/.verde/data/master
- mkdir -p $VERDE_DATA_DIR
- cp -r data/* $VERDE_DATA_DIR
# Get the Fatiando CI scripts
- git clone --branch=1.2.0 --depth=1 https://github.com/fatiando/continuous-integration.git
# Download and install miniconda and setup dependencies
Expand Down
30 changes: 13 additions & 17 deletions data/examples/README.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,27 +3,23 @@
Sample Data
===========

Verde provides some sample data and ways of generating synthetic data through the
:mod:`verde.datasets` module. The sample data are automatically downloaded from the `Github
repository <https://github.com/fatiando/verde>`__ to a folder on your computer the first
time you use them. After that, the data are loaded from this folder. The download is
managed by the :mod:`pooch` package.
Verde provides some sample data and ways of generating synthetic data through
the :mod:`verde.datasets` module.

Where are my data files?
------------------------

Where is my data?
-----------------

The data files are downloaded to a folder ``~/.verde/data/`` by default. This is the
*base data directory*. :mod:`pooch` will create a separate folder in the base directory
for each version of Verde. So for Verde 0.1, the base data dir is ``~/.verde/data/0.1``.
If you're using the latest development version from Github, the version is ``master``.

You can change the base data directory by setting the ``VERDE_DATA_DIR`` environment
variable to a different path.
The sample data files are downloaded automatically by :mod:`pooch` the first
time you load them. The files are saved to the default cache location on your
operating system. The location varies depending on your system and
configuration. We provide the :func:`verde.datasets.locate` function if you
need to find the data storage location on your system.

You can change the base data directory by setting the ``VERDE_DATA_DIR``
environment variable to the desired path.

Available datasets
------------------

These are the datasets currently available. Most also come with a function for setting
up a Cartopy map to display the data.
These are the datasets currently available. Most also come with a companion
function for setting up a Cartopy map to display the data.
1 change: 1 addition & 0 deletions doc/api/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Datasets
.. autosummary::
:toctree: generated/

datasets.locate
datasets.CheckerBoard
datasets.fetch_baja_bathymetry
datasets.setup_baja_bathymetry_map
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@ scipy
pandas
xarray
scikit-learn
pooch
pooch>=0.7.0
9 changes: 8 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,14 @@
"verde.datasets": ["registry.txt"],
"verde.tests": ["data/*", "baseline/*"],
}
INSTALL_REQUIRES = ["numpy", "scipy", "pandas", "xarray", "scikit-learn", "pooch"]
INSTALL_REQUIRES = [
"numpy",
"scipy",
"pandas",
"xarray",
"scikit-learn",
"pooch>=0.7.0",
]
PYTHON_REQUIRES = ">=3.6"

if __name__ == "__main__":
Expand Down
1 change: 1 addition & 0 deletions verde/datasets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
# pylint: disable=missing-docstring
from .synthetic import CheckerBoard
from .sample_data import (
locate,
fetch_baja_bathymetry,
setup_baja_bathymetry_map,
fetch_rio_magnetic,
Expand Down
27 changes: 24 additions & 3 deletions verde/datasets/sample_data.py
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
"""
Functions to load sample data
"""
import os
import warnings

import pkg_resources
import numpy as np
import pandas as pd
import pooch
Expand All @@ -22,13 +22,34 @@
warnings.simplefilter("default")

POOCH = pooch.create(
path=["~", ".verde", "data"],
path=pooch.os_cache("verde"),
base_url="https://github.com/fatiando/verde/raw/{version}/data/",
version=full_version,
version_dev="master",
env="VERDE_DATA_DIR",
)
POOCH.load_registry(os.path.join(os.path.dirname(__file__), "registry.txt"))
POOCH.load_registry(pkg_resources.resource_stream("verde.datasets", "registry.txt"))


def locate():
r"""
The absolute path to the sample data storage location on disk.

This is where the data are saved on your computer. The location is
dependent on the operating system. The folder locations are defined by the
``appdirs`` package (see the `appdirs documentation
<https://github.com/ActiveState/appdirs>`__).

The location can be overwritten by the ``VERDE_DATA_DIR`` environment
variable to the desired destination.

Returns
-------
path : str
The local data storage location.

"""
return str(POOCH.abspath)


def _setup_map(
Expand Down
12 changes: 12 additions & 0 deletions verde/tests/test_datasets.py
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
"""
Test data fetching routines.
"""
import os

import matplotlib.pyplot as plt
import cartopy.crs as ccrs

import pytest

from ..datasets.sample_data import (
locate,
fetch_baja_bathymetry,
setup_baja_bathymetry_map,
fetch_rio_magnetic,
Expand All @@ -18,6 +21,15 @@
)


def test_datasets_locate():
"Make sure the data cache location has the right package name"
path = locate()
assert os.path.exists(path)
# This is the most we can check in a platform independent way without
# testing appdirs itself.
assert "verde" in path


def test_fetch_texas_wind():
"Make sure the data are loaded properly"
data = fetch_texas_wind()
Expand Down