Merge pull request dynamicslab#71 from neuralhydrology/staging

Rework docs, add issue templates, fix minor bugs
jpcurbelo · Feb 9, 2022 · b6d43b2 · b6d43b2
2 parents 31bd284 + 8606cb7
commit b6d43b2
Show file tree

Hide file tree

Showing 17 changed files with 417 additions and 262 deletions.
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,31 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: "[BUG] Title describing the bug"
+labels: ''
+assignees: ''
+
+---
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To Reproduce**
+Steps to reproduce the behavior. E.g. which data did you use, what were the commands that you executed, did you modify the code, etc.
+
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+
+**Description of the dataset**
+Provide details on the dataset that you use. Is it e.g. an out-of-the-box CAMELS dataset, or did you create your own csv/netCDF files. In that case, providing a data sample might be beneficial.
+
+**Logs & Screenshots**
+Please provide the full stack trace if any exception occured. If applicable, add screenshots to help explain your problem.
+
+**Desktop & Environment (please complete the following information):**
+ - OS: [e.g. Linux, Windows, iOS]
+ - The git commit if you cloned the repo, or the version number in `neuralhydrology/__about__.py`)
+ - The Python version and a list of installed Python packages. If you use conda, you can create this list via `conda env export`.
+
+**Additional context**
+Add any other context about the problem here.
diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,20 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: "[REQUEST] Title describing the feature request"
+labels: ''
+assignees: ''
+
+---
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+
+**Additional context**
+Add any other context or screenshots about the feature request here.
diff --git a/docs/source/tutorials/data-prerequisites.nblink b/docs/source/tutorials/data-prerequisites.nblink
@@ -0,0 +1,3 @@
+{
+    "path": "../../../examples/00-Data-Prerequisites/prerequisites.ipynb"
+}
diff --git a/docs/source/tutorials/index.rst b/docs/source/tutorials/index.rst
@@ -4,6 +4,9 @@ Tutorials
 All tutorials are based on Jupyter notebooks that are hosted on GitHub. 
 If you want to run the code yourself, you can find the notebooks in the `examples folder <https://github.com/neuralhydrology/neuralhydrology/tree/master/examples>`__ of the NeuralHydrology GitHub repository.
 
+| **Data Prerequisites**
+| For most of our tutorials you will need some data to train and evaluate models. In all of these examples we use the publicly available CAMELS US dataset. :doc:`This tutorial <data-prerequisites>` will guide you through the download process of the different dataset pieces and explain how the code expects the local folder structure.
+
 | **Introduction to NeuralHydrology**
 | If you're new to the NeuralHydrology package, :doc:`this tutorial <introduction>` is the place to get started. It walks you through the basic command-line and API usage patterns, and you get to train and evaluate your first model.
 
@@ -26,6 +29,7 @@ If you want to run the code yourself, you can find the notebooks in the `example
    :maxdepth: 1
    :caption: Contents:
 
+   data-prerequisites
    introduction
    adding-gru
    add-dataset

diff --git a/docs/source/usage/quickstart.rst b/docs/source/usage/quickstart.rst
@@ -56,9 +56,7 @@ Data
 Training and evaluating models requires a dataset.
 If you're unsure where to start, a common dataset is CAMELS US, available at
 `CAMELS US (NCAR) <https://ral.ucar.edu/solutions/products/camels>`_.
-Download the "CAMELS time series meteorology, observed flow, meta data" and place the actual data folder
-(``basin_dataset_public_v1p2``) in a directory.
-This directory will be referred to as the "data directory", or ``data_dir``.
+This dataset is used in all of our tutorials and we have a `dedicated tutorial <../tutorials/data-prerequisites.nblink>`_ with download instructions that you might want to look at.
 
 
 Training a model

diff --git a/examples/00-Data-Prerequisites/prerequisites.ipynb b/examples/00-Data-Prerequisites/prerequisites.ipynb
@@ -0,0 +1,78 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Data Prerequisites \n",
+    "\n",
+    "All of our tutorials in which you train and evaluate a model use the [CAMELS US](https://ral.ucar.edu/solutions/products/camels) data set, either in its original form or with some extensions. \n",
+    "In this notebook, we will guide you through the process of downloading all essential data set pieces and explain how NeuralHydrology expects the folder structure of the CAMELS US dataset so that you will be able to run all of the tutorials.\n",
+    "\n",
+    "## CAMELS US meteorological time series and streamflow data\n",
+    "\n",
+    "The meteorological time series serve in most of our tutorials as model inputs, while the streamflow time series are the target values. You can download both from the [NCAR Homepage](https://ral.ucar.edu/solutions/products/camels). Click on \"\tCAMELS time series meteorology, observed flow, meta data (.zip)\" under \"CAMELS hydrometeorological time series\" or use [this](https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/basin_timeseries_v1p2_metForcing_obsFlow.zip) direct link. The downloaded zip file, called `basin_timeseries_v1p2_metForcing_obsFlow.zip` contains two folders: `basin_dataset_public` (empty, 0 bytes) and `basin_dataset_public_v1p2` (not empty, 14.9 GB). Extract the second one (basin_dataset_public_v1p2) to any place you like and probably rename it something more meaningful, like `CAMELS_US`. This folder is referred to as the root directory of the CAMELS US dataset. Among others, it should contain the following subdirectories:\n",
+    "\n",
+    "```\n",
+    "CAMELS_US/              # originally named basin_dataset_public_v1p2\n",
+    "- basin_mean_forcings/  # contains the meteorological time series data \n",
+    "- usgs_streamflow/      # contains the streamflow data\n",
+    "- ...\n",
+    "```\n",
+    "\n",
+    "**NOTE**: In the default configs of our tutorials, we assume that the data is stored in `neuralhydrology/data/CAMELS_US`. If you stored the data elsewhere, either create a symbolic link to this location or change the `data_dir` argument in the `.yml` configs of the corresponding tutorials to point to your local CAMELS US root directory.\n",
+    "\n",
+    "\n",
+    "## Hourly forcing and streamflow data for CAMELS US basins\n",
+    "\n",
+    "(required for Tutorial 04 - Multi-Timescale Prediction)\n",
+    "\n",
+    "To be able to run this example yourself, you will need to download the [hourly NLDAS forcings and the hourly streamflow data](https://doi.org/10.5281/zenodo.4072700). Within the CAMELS US root directory, place the `nldas_hourly` and `usgs-streamflow` folders into a directory called `hourly` (`/path/to/CAMELS_US/hourly/{nldas_hourly,usgs-streamflow}`).\n",
+    "Alternatively, you can place the hourly netCDF file (`usgs-streamflow-nldas_hourly.nc`) from [Zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.4072700) inside the `hourly/` folder instead of the NLDAS and streamflow csv files. Loading from netCDF will be faster than from the csv files. In case of the first option (downloading the two folders), the CAMELS US folder structure from above would extend to:\n",
+    "\n",
+    "```\n",
+    "CAMELS_US/              # originally named basin_dataset_public_v1p2\n",
+    "- basin_mean_forcings/  # contains the meteorological time series data \n",
+    "- usgs_streamflow/      # contains the streamflow data\n",
+    "- hourly/               # newly created folder to store the hourly forcing and streamflow data\n",
+    "    - nldas_hourly/     # NLDAS hourly forcing data\n",
+    "    - usgs-streamflow/  # hourly streamflow data\n",
+    "- ...\n",
+    "```\n",
+    "\n",
+    "In case you downloaded the `usgs-streamflow-nldas_hourly.nc` it should like this:\n",
+    "\n",
+    "```\n",
+    "CAMELS_US/                              # originally named basin_dataset_public_v1p2\n",
+    "- basin_mean_forcings/                  # contains the meteorological time series data \n",
+    "- usgs_streamflow/                      # contains the streamflow data\n",
+    "- hourly/                               # newly created folder to store the hourly forcing and streamflow data\n",
+    "    - usgs-streamflow-nldas_hourly.nc   # netCDF file containing hourly forcing and streamflow data\n",
+    "- ...\n",
+    "```\n",
+    "\n",
+    "## CAMELS US catchment attributes\n",
+    "\n",
+    "(required for Tutorial 06 - How-to Finetuning)\n",
+    "\n",
+    "When training a deep learning model, such as an LSTM, on data from more than one basin it is recommended to also use static catchment attributes as model inputs, alongside the meteorological forcings (see e.g. [this paper](https://hess.copernicus.org/articles/23/5089/2019/)). In tutorial 06, we use the static catchment attributes from the CAMELS US dataset that can be downloaded one the [same homeage](https://ral.ucar.edu/solutions/products/camels), a bit further down. Search for the section called \"CAMELS catchment attributes\". Here, download the only listed zip file \"CAMELS Attributes (.zip)\" or use [this](https://ral.ucar.edu/sites/default/files/public/product-tool/camels-catchment-attributes-and-meteorology-for-large-sample-studies-dataset-downloads/camels_attributes_v2.0.zip) direct link. The downloaded archive contains a folder called `camels_attributes_v2.0`. Extract this folder into the CAMELS US root directory (at the same level of `basin_mean_forcings` and `usgs_streamflow`). So your folder structure should at least look like this:\n",
+    "\n",
+    "```\n",
+    "CAMELS_US/                  # originally named basin_dataset_public_v1p2\n",
+    "- basin_mean_forcings/      # contains the meteorological time series data \n",
+    "- usgs_streamflow/          # contains the streamflow data\n",
+    "- camels_attributes_v2.0/   # extracted catchment attributes\n",
+    "- ...\n",
+    "```\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  },
+  "orig_nbformat": 4
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/examples/01-Introduction/Introduction.ipynb b/examples/01-Introduction/Introduction.ipynb
@@ -5,7 +5,11 @@
    "metadata": {},
    "source": [
     "# Introduction to NeuralHydrology\n",
-    "Before we start: This tutorial is rendered from a Jupyter notebook that is hosted on GitHub. If you want to run the code yourself, you can find the notebook and configuration files [here](https://github.com/neuralhydrology/neuralhydrology/tree/master/examples/01-Introduction).\n",
+    "\n",
+    "**Before we start**\n",
+    "\n",
+    "- This tutorial is rendered from a Jupyter notebook that is hosted on GitHub. If you want to run the code yourself, you can find the notebook and configuration files [here](https://github.com/neuralhydrology/neuralhydrology/tree/master/examples/01-Introduction).\n",
+    "- To be able to run this notebook locally, you need to download the publicly available CAMELS US rainfall-runoff dataset. See the [Data Prerequisites Tutorial](data-prerequisites.nblink) for a detailed description on where to download the data and how to structure your local dataset folder.\n",
     "\n",
     "The Python package NeuralHydrology was was developed with a strong focus on research. The main application area is hydrology, however, in principle the code can be used with any data. To allow fast iteration of research ideas, we tried to develop the package as modular as possible so that new models, new data sets, new loss functions, new regularizations, new metrics etc. can be integrated with minor effort.\n",
     "\n",
@@ -20,10 +24,6 @@
     "\n",
     "For every run that you start, a new folder will be created. This folder is used to store the model and optimizer checkpoints, train data means/stds (needed for scaling during inference), tensorboard log file (can be used to monitor and compare training runs visually), validation results (optionally) and training progress figures (optionally, e.g., model predictions and observations for _n_ random basins). During inference, the evaluation results will also be stored in this directory (e.g., test period results).\n",
     "\n",
-    "### Data requirements\n",
-    "\n",
-    "This tutorial uses data from the publicly available [CAMELS US dataset](https://ral.ucar.edu/solutions/products/camels). If you want to run this tutorial yourself, make sure to download the dataset (streamflow data, meteorological forcings and attributes) from the NCAR homepage.\n",
-    "\n",
     "\n",
     "### TensorBoard logging\n",
     "By default, the training progress is logged in TensorBoard files (add `log_tensorboard: False` to the config to disable TensorBoard logging). If you installed a Python environment from one of our environment files, you have TensorBoard already installed. If not, you can install TensorBoard with:\n",
@@ -100,6 +100,7 @@
     "from pathlib import Path\n",
     "\n",
     "import matplotlib.pyplot as plt\n",
+    "import torch\n",
     "from neuralhydrology.evaluation import metrics\n",
     "from neuralhydrology.nh_run import start_run, eval_run"
    ]
@@ -110,7 +111,10 @@
    "source": [
     "### Train a model for a single config file\n",
     "\n",
-    "The config file assumes that the CAMELS US dataset is stored under `data/CAMELS_US` (relative to the main directory of this repository) or a symbolic link exists at this location. Make sure that this folder contains the required subdirectories `basin_mean_forcing`, `usgs_streamflow` and `camels_attributes_v2.0`. If your data is stored at a different location and you can't or don't want to create a symbolic link, you will need to change the `data_dir` argument in the `1_basin.yml` config file that is located in the same directory as this notebook."
+    "**Note**\n",
+    "\n",
+    "- The config file assumes that the CAMELS US dataset is stored under `data/CAMELS_US` (relative to the main directory of this repository) or a symbolic link exists at this location. Make sure that this folder contains the required subdirectories `basin_mean_forcing`, `usgs_streamflow` and `camels_attributes_v2.0`. If your data is stored at a different location and you can't or don't want to create a symbolic link, you will need to change the `data_dir` argument in the `1_basin.yml` config file that is located in the same directory as this notebook.\n",
+    "- By default, the config (`1_basin.yml`) assumes that you have a CUDA-capable NVIDIA GPU (see config argument `device`). In case you don't have any or you have one but want to train on the CPU, you can either change the config argument to `device: cpu` or pass `gpu=-1` to the `start_run()` function."
    ]
   },
   {
@@ -313,15 +317,21 @@
     }
    ],
    "source": [
-    "start_run(config_file=Path(\"1_basin.yml\"))"
+    "# by default we assume that you have at least one CUDA-capable NVIDIA GPU\n",
+    "if torch.cuda.is_available():\n",
+    "    start_run(config_file=Path(\"1_basin.yml\"))\n",
+    "\n",
+    "# fall back to CPU-only mode\n",
+    "else:\n",
+    "    start_run(config_file=Path(\"1_basin.yml\"), gpu=-1)"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
     "### Evaluate run on test set\n",
-    "The run directory that needs to be specified for evaluation is printed in the output log above. Since the folder name is created dynamically (including the date and time of the start of the run) you will need to change the `run_dir` argument according to your local directory name."
+    "The run directory that needs to be specified for evaluation is printed in the output log above. Since the folder name is created dynamically (including the date and time of the start of the run) you will need to change the `run_dir` argument according to your local directory name. By default, it will use the same device as during the training process."
    ]
   },
   {

diff --git a/examples/02-Adding-Models/adding-gru.ipynb b/examples/02-Adding-Models/adding-gru.ipynb
@@ -21,7 +21,6 @@
    "outputs": [],
    "source": [
     "import inspect\n",
-    "from pathlib import Path\n",
     "from typing import Dict\n",
     "\n",
     "import torch\n",
@@ -30,7 +29,6 @@
     "from neuralhydrology.modelzoo import get_model\n",
     "from neuralhydrology.modelzoo.head import get_head\n",
     "from neuralhydrology.modelzoo.basemodel import BaseModel\n",
-    "from neuralhydrology.modelzoo.template import TemplateModel\n",
     "from neuralhydrology.utils.config import Config"
    ]
   },

diff --git a/examples/03-Adding-Datasets/adding-camels-cl.ipynb b/examples/03-Adding-Datasets/adding-camels-cl.ipynb
@@ -35,14 +35,11 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "import sys\n",
     "from pathlib import Path\n",
     "from typing import List, Dict, Union\n",
     "\n",
-    "import numpy as np\n",
     "import pandas as pd\n",
     "import xarray\n",
-    "from tqdm import tqdm\n",
     "\n",
     "from neuralhydrology.datasetzoo.basedataset import BaseDataset\n",
     "from neuralhydrology.utils.config import Config"

diff --git a/examples/04-Multi-Timescale/1_basin.yml b/examples/04-Multi-Timescale/1_basin.yml
@@ -136,16 +136,10 @@ data_dir: ../../data/CAMELS_US
 # can be either a list of forcings or a single forcing product
 forcings:
   - nldas_hourly
-  - maurer_extended
   - daymet
 
 dynamic_inputs:
   1D:
-    - prcp(mm/day)_maurer_extended
-    - srad(W/m2)_maurer_extended
-    - tmax(C)_maurer_extended
-    - tmin(C)_maurer_extended
-    - vp(Pa)_maurer_extended
     - prcp(mm/day)_daymet
     - srad(W/m2)_daymet
     - tmax(C)_daymet
@@ -163,11 +157,6 @@ dynamic_inputs:
     - total_precipitation_nldas_hourly
     - wind_u_nldas_hourly
     - wind_v_nldas_hourly
-    - prcp(mm/day)_maurer_extended
-    - srad(W/m2)_maurer_extended
-    - tmax(C)_maurer_extended
-    - tmin(C)_maurer_extended
-    - vp(Pa)_maurer_extended
     - prcp(mm/day)_daymet
     - srad(W/m2)_daymet
     - tmax(C)_daymet