From 2909c66c584a5d5be38b9609353215a8c7b41f07 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20=C2=A9?= Date: Mon, 25 Oct 2021 15:54:19 +0100 Subject: [PATCH 01/11] add IceNet notebook --- IceNet/polar-modelling-icenet.ipynb | 1395 +++++++++++++++++++++++++++ 1 file changed, 1395 insertions(+) create mode 100644 IceNet/polar-modelling-icenet.ipynb diff --git a/IceNet/polar-modelling-icenet.ipynb b/IceNet/polar-modelling-icenet.ipynb new file mode 100644 index 0000000..e2284df --- /dev/null +++ b/IceNet/polar-modelling-icenet.ipynb @@ -0,0 +1,1395 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "source": [ + "# Sea ice forecasting using IceNet\n", + "\n", + "## Context\n", + "### Purpose\n", + "Demonstrate IceNet, a deep learning sea ice forecasting system trained using climate simulations and observational data.\n", + "\n", + "### Modelling approach\n", + "**IceNet** is a probabilistic, deep learning sea ice forecasting system. The model, an ensemble of U-Net networks, learns how sea ice changes from climate simulations and observational data to forecast up to 6 months of monthly-averaged sea ice concentration maps at 25 km resolution. IceNet advances the range of accurate sea ice forecasts, outperforming a state-of-the-art dynamical model in seasonal forecasts of summer sea ice, particularly for extreme sea ice events. IceNet was implemented in Python 3.7 using TensorFlow v2.2.0. Further details can be found in the Nature Communications paper [*Seasonal Arctic sea ice forecasting with probabilistic deep learning*](https://www.nature.com/articles/s41467-021-25257-4).\n", + "\n", + "### Highlights\n", + "* Clone and access IceNet's codebase to produce seasonal Arctic sea ice forecasts using 3 out of 25 five pre-trained IceNet models [downloaded from the Polar Data Centre](https://doi.org/10.5285/71820e7d-c628-4e32-969f-464b7efb187c).\n", + "* Forecast a single year, 2020, using IceNet's preprocessed environmental input data downloaded from a Zenodo repository.\n", + "* Visualise IceNet’s seasonal ice edge predictions at 4- to 1-month lead times.\n", + "* Interactive plots comparing IceNet predictions against ECMWF SEAS5 physics-based sea ice concentration and a linear trend statistical benchmark.\n", + "\n", + "### Notebook contributions\n", + "#### Author\n", + "Alejandro Coca-Castro, The Alan Turing Institute, [@acocac](https://github.com/acocac)\n", + "\n", + "#### Reviewers\n", + "Tom R. Andersson, British Antarctic Survey, [@tom-andersson](https://github.com/tom-andersson), 21/10/21 (latest revision)\n", + "\n", + "#### Version\n", + "The initial version of this notebook was generated through the Environmental AI book, see the commit [dbfb9cf](https://github.com/acocac/environmental-ai-book/commits/master/book/polar/modelling/polar-modelling-icenet.ipynb). The version was adapted to the Pangeo examples repo.\n", + " \n", + "### Modelling contributions\n", + "#### Codebase\n", + "- Tom R. Andersson (author), British Antarctic Survey, [@tom-andersson](https://github.com/tom-andersson)\n", + "- James Byrne (contributor), British Antarctic Survey, [@JimCircadian](https://github.com/JimCircadian)\n", + "- Tony Phillips (contributor), British Antarctic Survey\n", + "\n", + "#### Paper\n", + "Tom R. Andersson, J. Scott Hosking, María Pérez-Ortiz, Brooks Paige, Andrew Elliott, Chris Russell, Stephen Law, Daniel C. Jones, Jeremy Wilkinson, Tony Phillips, James Byrne, Steffen Tietsche, Beena Balan Sarojini, Eduardo Blanchard-Wrigglesworth, Yevgeny Aksenov, Rod Downie & Emily Shuckburgh. See [here](https://www.nature.com/articles/s41467-021-25257-4#author-information) further author information (affiliations and contributions).\n", + "\n", + "#### Version\n", + "The version explored of the IceNet codebase is 1.0.0 commit [9d69ad7](https://github.com/tom-andersson/icenet-paper/compare/v1.0.0...main)\n", + "\n", + "### Funding\n", + "The IceNet project was supported by Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC Grant EP/T001569/1, particularly the AI for Science’ theme within that grant and The Alan Turing Institute.\n", + "\n", + ":::{note}\n", + "The notebook contributors acknowledge the IceNet developers for providing a fully reproducible and public code available at [https://github.com/tom-andersson/icenet-paper](https://github.com/tom-andersson/icenet-paper). Some snippets from IceNet's source code were adapted to this notebook.\n", + ":::" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "## Clone the IceNet GitHub repo" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "!git clone -q https://github.com/tom-andersson/icenet-paper.git polar-modelling-icenet" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## Set the virtual environment" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# Import the required packages\n", + "import virtualenv\n", + "import pip\n", + "import os\n", + "\n", + "# Define and create the base directory install virtual environments\n", + "venvs_dir = os.path.join(os.path.expanduser(\"~\"), \"nb-venvs\")\n", + "\n", + "if not os.path.isdir(venvs_dir):\n", + " os.makedirs(venvs_dir)\n", + "\n", + "# Define the venv directory\n", + "venv_dir = os.path.join(venvs_dir, 'venv-icenet')\n", + "\n", + "if not os.path.exists(venv_dir):\n", + " # Create the virtual environment\n", + " virtualenv.create_environment(venv_dir)\n", + "\n", + " # Install a set of required packages via `pip`\n", + " requirements = ['matplotlib', 'urllib3', 'tqdm', 'xarray','tensorflow==2.2.0', 'hvplot','geoviews']\n", + "\n", + " for pkg in requirements:\n", + " pip.main([\"install\", \"--prefix\", venv_dir, pkg])\n", + "\n", + "# Activate the venv\n", + "activate_file = os.path.join(venv_dir, \"bin\", \"activate_this.py\")\n", + "exec(open(activate_file).read(), dict(__file__=activate_file))" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## Load libraries" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# system\n", + "import os\n", + "import sys\n", + "sys.path.insert(0, os.path.join(os.getcwd(), 'polar-modelling-icenet', 'icenet'))\n", + "\n", + "# data\n", + "import json\n", + "import pandas as pd\n", + "import numpy as np\n", + "import xarray as xr\n", + "\n", + "# custom functions from the icenet repo\n", + "from utils import IceNetDataLoader, create_results_dataset_index, arr_to_ice_edge_arr\n", + "\n", + "# modelling\n", + "from tensorflow.keras.models import load_model\n", + "\n", + "# plotting\n", + "import matplotlib.pyplot as plt\n", + "from matplotlib.figure import Figure\n", + "from matplotlib.backends.backend_agg import FigureCanvas\n", + "from matplotlib.offsetbox import AnchoredText\n", + "\n", + "import holoviews as hv\n", + "\n", + "import hvplot.pandas\n", + "import hvplot.xarray\n", + "\n", + "from bokeh.models.formatters import DatetimeTickFormatter\n", + "\n", + "import panel as pn\n", + "pn.extension()\n", + "\n", + "# utils\n", + "import urllib.request\n", + "import re\n", + "from tqdm.notebook import tqdm\n", + "import calendar\n", + "from pprint import pprint\n", + "\n", + "pd.options.display.max_columns = 10\n", + "hv.extension('bokeh', width=100)" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## Set project structure\n", + "\n", + "Let's follow the structure of the IceNet paper as it is indicated in the source code [config.py](https://github.com/tom-andersson/icenet-paper/blob/main/icenet/config.py) file. The structure allows conveniently using IceNet's custom data loader." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# data folder\n", + "data_folder = './data'\n", + "\n", + "config = {\n", + " 'obs_data_folder': os.path.join(data_folder, 'obs'),\n", + " 'mask_data_folder': os.path.join(data_folder, 'masks'),\n", + " 'forecast_data_folder': os.path.join(data_folder, 'forecasts'),\n", + " 'network_dataset_folder': os.path.join(data_folder, 'network_datasets'),\n", + " 'dataloader_config_folder': './polar-modelling-icenet/dataloader_configs',\n", + " 'network_h5_files_folder': './polar-modelling-icenet/networks',\n", + " 'forecast_results_folder': './polar-modelling-icenet/results',\n", + "}\n", + "\n", + "# Generate the folder structure through a list of comprehension\n", + "[os.makedirs(val) for key, val in config.items() if not os.path.exists(val)]" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## Download input data and models\n", + "\n", + "IceNet consists of 25 ensemble members i.e. models. For this demonstrator, we only download three of them to reduce computational cost (note that this will reduce performance compared with the full ensemble). We also fetch analysis-ready i.e. preprocessed data of climate observations, ground thruth sea ice concentration (SIC) and a IceNet's project configuration file from a Zenodo repository. Finally, we call a script from the IceNet paper repo to generate masks required for computing metrics and visualisation." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "### Download pretrained IceNet models\n", + "\n", + "Let's download 3 out of 25 ensemble members [retrieved from the Polar Data Centre](https://doi.org/10.5285/71820e7d-c628-4e32-969f-464b7efb187c). The models are numbered from 36 to 60. For this example we use the networks 36, 42 and 53. It is worth to mention other pre-computed results from the Nature Communications paper can be downloaded including output results table, uncertainty, netCDF forecast of the 25 ensemble members, among others." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "url = 'https://ramadda.data.bas.ac.uk/repository/entry/get/'\n", + "\n", + "target_networks = [36, 42, 53]\n", + "\n", + "for network in target_networks:\n", + " urllib.request.urlretrieve(url + f'network_tempscaled_{network}.h5?entryid=synth%3A71820e7d-c628-4e32-969f-464b7efb187c%3AL25ldXJhbF9uZXR3b3JrX21vZGVsL25ldHdvcmtfdGVtcHNjYWxlZF8zNi5oNQ%3D%3D',\n", + " os.path.join(config['network_h5_files_folder'],f'network_tempscaled_{network}.h5'))" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Download ERA5 data (climate observations)\n", + "\n", + "Let's download analysis-ready i.e. preprocessed ERA5 observations from a zenodo repository.\n", + "\n", + ":::{note}\n", + "The analysis-ready data were generated by running the script `python3 icenet/preproc_icenet_data.py` in step **3.2) Preprocess the raw data** according to the [icenet-paper repository](https://github.com/tom-andersson/icenet-paper). The scripts normalise the raw NetCDF data, downloaded using the bash file `./download_era5_data_in_parallel.sh` (see the step **2) Download data**), and saves it as monthly `NumPy` files.\n", + ":::" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "filename = 'dataset1.zip'\n", + "url = f'https://zenodo.org/record/5516869/files/{filename}?download=1'\n", + "\n", + "if not os.path.isfile(config['network_dataset_folder'] + '/dataset1.zip') or os.path.getsize(config['network_dataset_folder'] + '/dataset1.zip') == 0:\n", + " urllib.request.urlretrieve(url, config['network_dataset_folder'] + '/dataset1.zip')\n", + " !unzip -qq ./data/network_datasets/dataset1.zip -d ./data/network_datasets" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Download ground truth SIC" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "filename = 'siconca_EASE.nc'\n", + "url = f'https://zenodo.org/record/5516869/files/{filename}?download=1'\n", + "\n", + "if not os.path.isfile(filename) or os.path.getsize(filename) == 0:\n", + " urllib.request.urlretrieve(url, config['obs_data_folder'] + '/' + filename)" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Download mask\n", + "\n", + "The script `icenet/gen_masks.py` generates masks for land, the polar holes, OSI-SAF monthly maximum ice extent (the *active\n", + "grid cell region*), and the Arctic regions & coastline. Figures of the\n", + "masks are saved in the **./figures** folder." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "!python polar-modelling-icenet/icenet/gen_masks.py" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## Data loader\n", + "\n", + "The following lines show how to download and read a given IceNet's configuration `JSON` file into a custom loader, **IceNetDataLoader**. The loader conveniently dictates which variables are input to the networks, which climate simulations are used for pre-training, and how far ahead to forecast." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "dataloader_ID = '2021_09_03_1300_icenet_demo.json'\n", + "url = f'https://zenodo.org/record/5516869/files/{dataloader_ID}?download=1'\n", + "\n", + "if not os.path.isfile(config['dataloader_config_folder'] + '/' + dataloader_ID) or os.path.getsize(config['dataloader_config_folder'] + '/' + dataloader_ID) == 0:\n", + " urllib.request.urlretrieve(url, config['dataloader_config_folder'] + '/' + dataloader_ID)\n", + "\n", + "with open(config['dataloader_config_folder'] + '/' + dataloader_ID, 'r') as readfile:\n", + " dataloader_config = json.load(readfile)\n", + "\n", + "pprint(dataloader_config['input_data'])" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "The `input_data` element of the IceNet's `JSON` file lists input variables and corresponding settings. We use the same input data of Nature Communications' paper which consists of SIC, 11 climate variables, statistical SIC forecasts, and metadata (see [Supplementary Table 2](https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-021-25257-4/MediaObjects/41467_2021_25257_MOESM1_ESM.pdf)). These layers are stacked in an identical manner to the RGB channels of a traditional image, amounting to 50 channels in total." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# Load dataloader\n", + "dataloader_config_fpath = os.path.join(config['dataloader_config_folder'], dataloader_ID)\n", + "\n", + "# Data loader\n", + "print(\"\\nSetting up the data loader with config file: {}\\n\\n\".format(dataloader_ID))\n", + "dataloader = IceNetDataLoader(dataloader_config_fpath)\n", + "print('\\n\\nDone.\\n')" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## Load networks\n", + "\n", + "Let's also load the ensemble IceNet's members using the `load_model` function imported from Keras API with Tensorflow backend." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "network_regex = re.compile('^network_tempscaled_([0-9]*).h5$')\n", + "\n", + "network_fpaths = [os.path.join(config['network_h5_files_folder'], f) for f in\n", + " sorted(os.listdir(config['network_h5_files_folder'])) if network_regex.match(f)]\n", + "\n", + "ensemble_seeds = [network_regex.match(f)[1] for f in\n", + " sorted(os.listdir(config['network_h5_files_folder'])) if network_regex.match(f)]\n", + "\n", + "networks = []\n", + "for network_fpath in network_fpaths:\n", + " print('Loading model from {}... '.format(network_fpath), end='', flush=True)\n", + " networks.append(load_model(network_fpath, compile=False))\n", + " print('Done.')" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## Modelling\n", + "\n", + "### Forecast settings\n", + "Now let's set the target model and forecast dates, start `forecast_start` (Jan 2020) and end `forecast_end` (Dec 2020). We also extract the number of forecast months from the IceNet's custom dataloader." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "model = 'IceNet'\n", + "\n", + "forecast_start = pd.Timestamp('2020-01-01')\n", + "forecast_end = pd.Timestamp('2020-12-01')\n", + "\n", + "n_forecast_months = dataloader.config['n_forecast_months']\n", + "print('\\n# of forecast months: {}\\n'.format(n_forecast_months))" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Set up forecast folder" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "forecast_folder = os.path.join(config['forecast_data_folder'], 'icenet', dataloader_ID, model)\n", + "\n", + "if not os.path.exists(forecast_folder):\n", + " os.makedirs(forecast_folder)" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Load ground truth SIC" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "print('Loading ground truth SIC... ', end='', flush=True)\n", + "true_sic_fpath = os.path.join(config['obs_data_folder'], 'siconca_EASE.nc')\n", + "true_sic_da = xr.open_dataarray(true_sic_fpath)\n", + "print('Done.')" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Set up forecast DataArray dictionary\n", + "\n", + "Now we are setting up an empty `xarray DataArray` object that we will use to store IceNet's forecasts. `DataArrays` let you conveniently handle, query and visualise spatio-temporal data as the forecast predictions generated by the IceNet system." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# define list of lead times\n", + "leadtimes = np.arange(1, n_forecast_months+1)\n", + "\n", + "# add ensemble to the list of models\n", + "ensemble_seeds_and_mean = ensemble_seeds.copy()\n", + "ensemble_seeds_and_mean.append('ensemble')\n", + "\n", + "all_target_dates = pd.date_range(\n", + " start=forecast_start,\n", + " end=forecast_end,\n", + " freq='MS'\n", + ")\n", + "\n", + "all_start_dates = pd.date_range(\n", + " start=forecast_start - pd.DateOffset(months=n_forecast_months-1),\n", + " end=forecast_end,\n", + " freq='MS'\n", + ")\n", + "\n", + "shape = (len(all_target_dates),\n", + " *dataloader.config['raw_data_shape'],\n", + " n_forecast_months)\n", + "\n", + "coords = {\n", + " 'time': all_target_dates, # To be sliced to target dates\n", + " 'yc': true_sic_da.coords['yc'],\n", + " 'xc': true_sic_da.coords['xc'],\n", + " 'lon': true_sic_da.isel(time=0).coords['lon'],\n", + " 'lat': true_sic_da.isel(time=0).coords['lat'],\n", + " 'leadtime': leadtimes,\n", + " 'seed': ensemble_seeds_and_mean,\n", + " 'ice_class': ['no_ice', 'marginal_ice', 'full_ice']\n", + "}\n", + "\n", + "# Probabilistic SIC class forecasts\n", + "dims = ('seed', 'time', 'yc', 'xc', 'leadtime', 'ice_class')\n", + "shape = (len(ensemble_seeds_and_mean), *shape, 3)\n", + "\n", + "model_forecast = xr.DataArray(\n", + " data=np.zeros(shape, dtype=np.float32),\n", + " coords=coords,\n", + " dims=dims\n", + ")" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Build up forecasts\n", + "\n", + "In this step, we generate IceNet's forecast for the target period and write it into the empty `DataArrays` object. IceNet’s outputs are forecasts of three sea ice concentration (SIC) classes: open-water (SIC ≤ 15%), marginal ice (15% < SIC < 80%) and full ice (SIC ≥ 80%) for the following 6 months in the form of discrete probability distributions at each grid cell." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "for start_date in tqdm(all_start_dates):\n", + "\n", + " # Target forecast dates for the forecast beginning at this `start_date`\n", + " target_dates = pd.date_range(\n", + " start=start_date,\n", + " end=start_date + pd.DateOffset(months=n_forecast_months-1),\n", + " freq='MS'\n", + " )\n", + "\n", + " X, y, sample_weights = dataloader.data_generation([start_date])\n", + " mask = sample_weights > 0\n", + " pred = np.array([network.predict(X)[0] for network in networks])\n", + " pred *= mask # mask outside active grid cell region to zero\n", + " # concat ensemble mean to the set of network predictions\n", + " ensemble_mean_pred = pred.mean(axis=0, keepdims=True)\n", + " pred = np.concatenate([pred, ensemble_mean_pred], axis=0)\n", + "\n", + " for i, (target_date, leadtime) in enumerate(zip(target_dates, leadtimes)):\n", + " if target_date in all_target_dates:\n", + " model_forecast.\\\n", + " loc[:, target_date, :, :, leadtime] = pred[..., i]\n", + "\n", + "print('Saving forecast NetCDF for {}... '.format(model), end='', flush=True)\n", + "\n", + "forecast_fpath = os.path.join(forecast_folder, f'{model.lower()}_forecasts.nc'.format(model.lower()))\n", + "model_forecast.to_netcdf(forecast_fpath) #export file as Net\n", + "\n", + "print('Done.')" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## Results" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "### Settings\n", + "\n", + "The IceNet codebase allows computing operations in the memory or with `dask`. The computation in dask is optimal for predicting longer target periods (see further info in [icenet/analyse_heldout_predictions.py](https://github.com/tom-andersson/icenet-paper/blob/27ca44694eaa3cb5f02fd824c618c46a6701a301/icenet/analyse_heldout_predictions.py#L23)). The following lines show how to compute in the memory." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "### Setup" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "metric_compute_list = ['Binary accuracy', 'SIE error']\n", + "\n", + "forecast_fpath = os.path.join(forecast_folder, f'{model.lower()}_forecasts.nc'.format(model.lower()))\n", + "\n", + "chunks = {'seed': 1}\n", + "icenet_forecast_da = xr.open_dataarray(forecast_fpath, chunks=chunks)\n", + "icenet_seeds = icenet_forecast_da.seed.values" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Monthly masks (active grid cell regions to compute metrics over)" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "mask_fpath_format = os.path.join(config['mask_data_folder'], 'active_grid_cell_mask_{}.npy')\n", + "\n", + "month_mask_da = xr.DataArray(np.array(\n", + " [np.load(mask_fpath_format.format('{:02d}'.format(month))) for\n", + " month in np.arange(1, 12+1)],\n", + "))" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Download previous results" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "url = 'https://ramadda.data.bas.ac.uk/repository/entry/get/'\n", + "fn = '2021_07_01_183913_forecast_results.csv'\n", + "fn_suffix = '?entryid=synth%3A71820e7d-c628-4e32-969f-464b7efb187c%3AL3Jlc3VsdHMvZm9yZWNhc3RfcmVzdWx0cy8yMDIxXzA3XzAxXzE4MzkxM19mb3JlY2FzdF9yZXN1bHRzLmNzdg%3D%3D'\n", + "\n", + "if not os.path.isfile(os.path.join(config['forecast_results_folder'],fn)):\n", + " urllib.request.urlretrieve(url + fn + fn_suffix, os.path.join(config['forecast_results_folder'],fn))" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Initialise results dataframe\n", + "\n", + "Now we write forecast results over a old results file generated for IceNet's nature communications paper. The old results file contains the performance of all 25 ensemble models, ECMWF SEAS5 physics-based sea ice probability forecast and linear trend benchmark. For the purposes of this demonstrator, we remove the IceNet's ensemble records to replace with the performance of 3 assessed ensemble models." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "now = pd.Timestamp.now()\n", + "new_results_df_fname = now.strftime('%Y_%m_%d_%H%M%S_forecast_results.csv')\n", + "new_results_df_fpath = os.path.join(config['forecast_results_folder'], new_results_df_fname)\n", + "\n", + "print('New results will be saved to {}\\n\\n'.format(new_results_df_fpath))\n", + "\n", + "results_df_fnames = sorted([f for f in os.listdir(config['forecast_results_folder']) if re.compile('.*.csv').match(f)])\n", + "if len(results_df_fnames) >= 1:\n", + " old_results_df_fname = results_df_fnames[-1]\n", + " old_results_df_fpath = os.path.join(config['forecast_results_folder'], old_results_df_fname)\n", + " print('\\n\\nLoading previous results dataset from {}'.format(old_results_df_fpath))\n", + "\n", + "# Load previous results, do not interpret 'NA' as NaN\n", + "results_df = pd.read_csv(old_results_df_fpath, keep_default_na=False, comment='#')\n", + "\n", + "# Remove existing IceNet results\n", + "results_df = results_df[~results_df['Model'].str.startswith('IceNet')]\n", + "\n", + "# Drop spurious index column if present\n", + "results_df = results_df.drop('Unnamed: 0', axis=1, errors='ignore')\n", + "results_df['Forecast date'] = [pd.Timestamp(date) for date in results_df['Forecast date']]\n", + "\n", + "results_df = results_df.set_index(['Model', 'Ensemble member', 'Leadtime', 'Forecast date'])\n", + "\n", + "# Add new models to the dataframe\n", + "multi_index = create_results_dataset_index([model], leadtimes, all_target_dates, model, icenet_seeds)\n", + "results_df = results_df.append(pd.DataFrame(index=multi_index)).sort_index()" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Load forecasts and compute SIC" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "target_forecasts = icenet_forecast_da.sel(time=all_target_dates)" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "We obtain the sea ice probability (SIC>15%) by summing IceNet’s marginal ice (15%80%) probabilities." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "sip_da = target_forecasts.sel(ice_class=['marginal_ice', 'full_ice']).sum('ice_class')" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Ground truth SIC" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "true_sic_fpath = os.path.join(config['obs_data_folder'], 'siconca_EASE.nc')\n", + "true_sic_da = xr.open_dataarray(true_sic_fpath, chunks={})\n", + "true_sic_da = true_sic_da.load()\n", + "true_sic_da = true_sic_da.sel(time=all_target_dates)\n", + "\n", + "if 'Binary accuracy' in metric_compute_list:\n", + " binary_true_da = true_sic_da > 0.15" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Monthwise masks" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "As we are showing in the next section, the monthly masks, stacked into a `DataArrays` object, are relevant to compute metrics only in the active grid cell region." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "months = [pd.Timestamp(date).month - 1 for date in all_target_dates]\n", + "mask_da = xr.DataArray(\n", + " [month_mask_da[month] for month in months],\n", + " dims=('time', 'yc', 'xc'),\n", + " coords={\n", + " 'time': true_sic_da.time.values,\n", + " 'yc': true_sic_da.yc.values,\n", + " 'xc': true_sic_da.xc.values,\n", + " }\n", + ")" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Compute performance metrics\n", + "\n", + "To analyse the forecast performance, IceNet's researchers compute two metrics, `Binary accuracy` and `Sea Ice Extent (SIE) error`. The former is generated over an active grid cell region for a given calendar month and can be seen as a normalised version of the integrated ice edge error (IIEE) (see further information of the meaning in Methods in the IceNet's [*Nature communications*](https://www.nature.com/articles/s41467-021-25257-4) paper. The latter, SIE error, is the difference between the overpredicted area and the underpredicted area. Both metrics are complementary, being the binary accuracy more robust for assessing IceNet’s relative seasonal forecast skill for September." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "print('Analysing forecasts: \\n\\n')\n", + "\n", + "print('Computing metrics:')\n", + "print(metric_compute_list)\n", + "\n", + "binary_forecast_da = sip_da > 0.5\n", + "\n", + "compute_ds = xr.Dataset()\n", + "for metric in metric_compute_list:\n", + "\n", + " if metric == 'Binary accuracy':\n", + " binary_correct_da = (binary_forecast_da == binary_true_da).astype(np.float32)\n", + " binary_correct_weighted_da = binary_correct_da.weighted(mask_da)\n", + "\n", + " # Mean percentage of correct classifications over the active\n", + " # grid cell area\n", + " ds_binacc = (binary_correct_weighted_da.mean(dim=['yc', 'xc']) * 100)\n", + " compute_ds[metric] = ds_binacc\n", + "\n", + " elif metric == 'SIE error':\n", + " binary_forecast_weighted_da = binary_forecast_da.astype(int).weighted(mask_da)\n", + " binary_true_weighted_da = binary_true_da.astype(int).weighted(mask_da)\n", + "\n", + " ds_sie_error = (\n", + " binary_forecast_weighted_da.sum(['xc', 'yc']) -\n", + " binary_true_weighted_da.sum(['xc', 'yc'])\n", + " ) * 25**2\n", + "\n", + " compute_ds[metric] = ds_sie_error\n", + "\n", + "print('Writing to results dataset...')\n", + "for compute_da in iter(compute_ds.data_vars.values()):\n", + " metric = compute_da.name\n", + "\n", + " compute_df_index = results_df.loc[\n", + " pd.IndexSlice[model, :, leadtimes, all_target_dates], metric].\\\n", + " droplevel(0).index\n", + "\n", + " # Ensure indexes are aligned for assigning to results_df\n", + " compute_df = compute_da.to_dataframe().reset_index().\\\n", + " set_index(['seed', 'leadtime', 'time']).\\\n", + " reindex(index=compute_df_index)\n", + "\n", + " results_df.loc[pd.IndexSlice[model, :, leadtimes, all_target_dates], metric] = \\\n", + " compute_df.values\n", + "\n", + "print('\\nCheckpointing results dataset... ', end='', flush=True)\n", + "results_df.to_csv(new_results_df_fpath)\n", + "print('Done.')" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## Analysis\n", + "\n", + "In this section, we explore the forecast results and provide some interpretation. Note we use a sample data so the results are only for demonstration purposes." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "markdown", + "source": [ + "### Plot settings" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "settings_lineplots = dict(padding=0.1, height=400, width=700, fontsize={'title': '120%','labels': '120%', 'ticks': '100%'})" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Preprocess results dataset" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# Reset index to preprocess results dataset\n", + "results_df = results_df.reset_index()\n", + "\n", + "results_df['Forecast date'] = pd.to_datetime(results_df['Forecast date'])\n", + "\n", + "month_names = np.array(['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun',\n", + " 'Jul', 'Aug', 'Sept', 'Oct', 'Nov', 'Dec'])\n", + "forecast_month_names = month_names[results_df['Forecast date'].dt.month.values - 1]\n", + "results_df['Calendar month'] = forecast_month_names\n", + "\n", + "results_df = results_df.set_index(['Model', 'Ensemble member', 'Leadtime', 'Forecast date'])\n", + "\n", + "# subset target period\n", + "results_df = results_df.loc(axis=0)[pd.IndexSlice[:, :, :, slice(forecast_start, forecast_end)]]\n", + "\n", + "results_df = results_df.sort_index()" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "Let's inspect the results `pandas data.frame` reporting the monthly performance of each ensemble member for the target period." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "results_df.head()" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Ice edge\n", + "\n", + "The following figure shows a method to interactively plotting how **IceNet** updates its forecasts using new initial conditions as the lead time decreases, with the predicted ice edge approaching the true ice edge. The observed ice edge (in black) is defined as the sea ice concentration (SIC)=15% contour. IceNet’s predicted ice edge (in green) is determined from its sea ice probability forecast as the P(SIC>15%)=0.5 contour.\n", + "\n", + "The dashboard (sliders + figure) is generated through the `panel` library, [an open-source Python library that lets you create custom interactive web apps and dashboards](https://panel.holoviz.org/index.html). In the settings below, we define two sliders which essentially allow us to interact with two variables, the month and lead time." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# set target year\n", + "year = 2020\n", + "\n", + "# set sliders\n", + "month_name = [f'{calendar.month_name[m]} {year}' for m in list(range(1, 13))]\n", + "\n", + "month_slider = pn.widgets.DiscreteSlider(name=\"Month\", options=month_name, value='September 2020', direction='rtl' ,width=200)\n", + "\n", + "lead_slider = pn.widgets.IntSlider(name=\"Lead time (months)\", start=1, end=4, step=1, direction='rtl', value=4, width=200)" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "::::{important}\n", + "The interactive figure below essentially reproduces [Figure 2](https://www.nature.com/articles/s41467-021-25257-4/figures/2) of the IceNet paper, however it covers a larger geographical extent i.e. in March when the ice edge extent is largest. Also, we visualise each month of the target period of this demonstrator (January to December 2020). Some script snippets were extracted from the IceNet script `python3 icenet/plot_paper_figures.py` (see [line 182](https://github.com/tom-andersson/icenet-paper/blob/main/icenet/plot_paper_figures.py)). Note we define alpha and colours for coastline and land mask object. These configurations allow overlapping these layers correctly to differenciate IceNet predictions and SIC ground thruth.\n", + "::::" + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "## set boundaries\n", + "mask = np.load(os.path.join(config['mask_data_folder'],\n", + " 'active_grid_cell_mask_{}.npy'.format('03')))\n", + "\n", + "min_0 = np.min(np.argwhere(mask)[:, 0])\n", + "max_0 = np.max(np.argwhere(mask)[:, 0])\n", + "mid_0 = np.mean((min_0, max_0)).astype(int)\n", + "min_1 = np.min(np.argwhere(mask)[:, 1])\n", + "max_1 = np.max(np.argwhere(mask)[:, 1])\n", + "mid_1 = np.mean((min_1, max_1)).astype(int)\n", + "max_diff = np.max([mid_0-min_0, mid_1-min_1])\n", + "max_diff *= .85 # Zoom in\n", + "max_diff = int(max_diff)\n", + "top = mid_0 - max_diff + 10\n", + "bot = mid_0 + max_diff + 10\n", + "left = mid_1 - max_diff\n", + "right = mid_1 + max_diff\n", + "\n", + "## land and region masks\n", + "land_mask = np.load(os.path.join(config['mask_data_folder'], 'land_mask.npy'))\n", + "region_mask = np.load(os.path.join(config['mask_data_folder'], 'region_mask.npy'))\n", + "\n", + "## define coastline and land layers\n", + "arr = region_mask == 13\n", + "coastline_rgba_arr = np.zeros((*arr.shape, 4))\n", + "coastline_rgba_arr[:, :, 3] = arr # alpha channel\n", + "coastline_rgba_arr[:, :, :3] = .5 # black coastline\n", + "land_mask_rgba_arr = np.zeros((*arr.shape, 4))\n", + "land_mask_rgba_arr[:, :, 3] = land_mask # alpha channel\n", + "land_mask_rgba_arr[:, :, :3] = .5 # gray land\n", + "\n", + "## line colours\n", + "pred_ice_edge_rgb = 'green'\n", + "true_ice_edge_rgb = 'black'\n", + "\n", + "## define plot function\n", + "@pn.depends(month_slider.param.value, lead_slider.param.value)\n", + "def plot_forecast(month, leadtime):\n", + " tdate = pd.Timestamp(year,month_name.index(month)+1,1)\n", + "\n", + " fig0 = Figure(figsize=(8, 8))\n", + " ax0 = fig0.subplots()\n", + " FigureCanvas(fig0) # not needed for mpl >= 3.1\n", + "\n", + " ax0.imshow(coastline_rgba_arr[top:bot, left:right, :], zorder=20)\n", + " ax0.imshow(land_mask_rgba_arr[top:bot, left:right, :], zorder=1)\n", + "\n", + " icenet_sip = icenet_forecast_da.sel(time=tdate, leadtime=leadtime).data\n", + " ax0.contour(\n", + " icenet_sip[0, top:bot, left:right, 0],\n", + " levels=[0.5],\n", + " colors=[pred_ice_edge_rgb],\n", + " zorder=1,\n", + " linewidths=1.5,\n", + " )\n", + "\n", + " groundtruth_sic = true_sic_da.sel(time=tdate)\n", + " gt_img = (groundtruth_sic>0.15).data\n", + "\n", + " ax0.contour(\n", + " gt_img[top:bot, left:right],\n", + " levels=[0.5],\n", + " colors=[true_ice_edge_rgb],\n", + " zorder=1,\n", + " linewidths=1.5\n", + " )\n", + " ax0.tick_params(which='both', bottom=False, left=False, labelbottom=False, labelleft=False)\n", + "\n", + " proxy = [plt.Line2D([0], [1], color=true_ice_edge_rgb),\n", + " plt.Line2D([0], [1], color=pred_ice_edge_rgb)]\n", + "\n", + " ax0.legend(proxy, ['Observed', 'Predicted'],\n", + " loc='upper left', fontsize=11)\n", + "\n", + " ax0.set_title(f'Date = {month} & Lead time = {leadtime} months')\n", + "\n", + " acc = results_df.loc['IceNet', 'ensemble', leadtime, tdate]['Binary accuracy']\n", + " sie_err = results_df.loc['IceNet', 'ensemble', leadtime, tdate]['SIE error']\n", + "\n", + " Afont = {\n", + " 'backgroundcolor': 'lightgray',\n", + " 'color': 'black',\n", + " 'weight': 'normal',\n", + " 'size': 11,\n", + " }\n", + "\n", + " t = AnchoredText('Binary acc: {:.1f}% \\nSIE error: {:+.3f} mil km$^2$'.format(acc,sie_err/1e6), prop=Afont, loc='lower right', pad=0.5, borderpad=0.4, frameon=False)\n", + " t = ax0.add_artist(t)\n", + " t.zorder = 21\n", + "\n", + " return pn.pane.Matplotlib(fig0, tight=True, dpi=150)\n", + "\n", + "plot_ie = pn.Row(\n", + " plot_forecast,\n", + " pn.Column(pn.Spacer(height=5), month_slider, pn.Spacer(height=15), lead_slider, background='#f0f0f0', sizing_mode=\"fixed\"),\n", + " width_policy='max', height_policy='max',\n", + ")\n", + "\n", + "plot_ie.embed()" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Model performance comparison\n", + "\n", + "The figure below shows the mean binary accuracy versus lead time over the 12 forecasted dates for IceNet, SEAS5 and linear trend benchmark. We observe IceNet outperform SEAS5 and linear trend models at lead times of 2 months and beyond." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "results_mean = results_df['Binary accuracy'].groupby(['Model','Ensemble member','Leadtime']).mean().reset_index()\n", + "results_mean = results_mean[results_mean['Ensemble member'].isin(['NA','ensemble'])]\n", + "\n", + "plot_ba = results_mean.hvplot(x='Leadtime', y='Binary accuracy', by='Model',\n", + " label='Lead times comparison',\n", + " ylabel='Binary accuracy',\n", + " xlabel='Lead time (months)',\n", + " color=['#1f77b4', 'gray', '#d62728'])\n", + "plot_ba.opts(legend_position='top_right', **settings_lineplots)\n", + "plot_ba" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "### Monthly performance comparison\n", + "\n", + "Different to the previous plot, the following figure compares the performance of the three models from January to December in 2020 by seasonal lead time. We confirm IceNet’s ability of seasonal forecast of summer ice (August, September and October) at lead times of two months and beyond outperforming both SEAS5 and the linear trend.\n", + "\n", + "We also observe SEAS5 outperforms IceNet at a 1-month lead time over time, except in October. According to the Nature communications paper, this is likely because IceNet only receives monthly averages as input, smearing the weather phenomena and initial conditions that dominate predictability on such short timescales." + ], + "metadata": { + "collapsed": false + } + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "lead_slider = pn.widgets.IntSlider(name=\"Lead time (months)\", start=1, end=4, step=1, width=150)\n", + "\n", + "results_plot = results_df.reset_index()\n", + "\n", + "formatter = DatetimeTickFormatter(months='%b')\n", + "\n", + "@pn.depends(lead_slider.param.value)\n", + "def plot_month(leadtime):\n", + "\n", + " results_lt = results_plot[results_plot.Leadtime==leadtime]\n", + " plot_ba = results_lt.hvplot(x='Forecast date',\n", + " y='Binary accuracy',\n", + " by='Model',\n", + " label='Monthly comparison',\n", + " ylabel='Binary accuracy',\n", + " xlabel='Forecast month',\n", + " color=['#1f77b4', 'gray', '#d62728'],\n", + " xformatter=formatter)\n", + "\n", + " return plot_ba.opts(legend_position='bottom_left', **settings_lineplots)\n", + "\n", + "plot_month = pn.Row(\n", + " plot_month,\n", + " pn.Column(pn.Spacer(height=5), lead_slider, background='#f0f0f0'),\n", + " width_policy='max', height_policy='max'\n", + ")\n", + "\n", + "plot_month.embed()" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, + { + "cell_type": "markdown", + "source": [ + "## Summary\n", + "\n", + "This notebook has demonstrated the use of:\n", + " - A custom dataloader, `IceNetDataLoader`, to conveniently dictate which variables are input to the networks, which climate simulations are used for pre-training, and how far ahead to forecast.\n", + " - How to append, filter, and manipulate new forecast results using `pandas`.\n", + " - `matplotlib` framed into a `panels` dashboard to visualise the IceNet forecast within the modelled period and four lead times.\n", + " - `hvplot` to plot time series data for comparing the performance of IceNet predictions against ECMWF SEAS5 physics-based sea ice concentration and a linear trend statistical benchmark.\n", + "\n", + "The IceNet's Nature Communications paper [*Seasonal Arctic sea ice forecasting with probabilistic deep learning*](https://www.nature.com/articles/s41467-021-25257-4) provides\n", + "further information of other key aspects e.g. variable importance, model calibration, etc. which for sake of simplicity are not covered in the demonstrator." + ], + "metadata": { + "collapsed": false + } + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.10" + }, + "widgets": { + "application/vnd.jupyter.widget-state+json": { + "state": {}, + "version_major": 2, + "version_minor": 0 + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file From c30ed0dd9ab9cec02e4cac5f9d2fffd15195fb80 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20=C2=A9?= Date: Mon, 25 Oct 2021 16:35:55 +0100 Subject: [PATCH 02/11] add environment.yml --- IceNet/environment.yml | 11 +++++++++++ 1 file changed, 11 insertions(+) create mode 100644 IceNet/environment.yml diff --git a/IceNet/environment.yml b/IceNet/environment.yml new file mode 100644 index 0000000..bebf332 --- /dev/null +++ b/IceNet/environment.yml @@ -0,0 +1,11 @@ +name: IceNet +channels: + - conda-forge +dependencies: + - python=3.8 + - matplotlib + - numpy + - hvplot + - tensorflow==2.2.0 + - geoviews + - urllib3 \ No newline at end of file From b30025843b212b4f3ed76d054f353939cfddcf30 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20=C2=A9?= Date: Tue, 26 Oct 2021 09:33:22 +0100 Subject: [PATCH 03/11] add environment.yml --- IceNet/environment.yml | 1 + 1 file changed, 1 insertion(+) diff --git a/IceNet/environment.yml b/IceNet/environment.yml index bebf332..2958e56 100644 --- a/IceNet/environment.yml +++ b/IceNet/environment.yml @@ -8,4 +8,5 @@ dependencies: - hvplot - tensorflow==2.2.0 - geoviews + - iris==3.0.1 - urllib3 \ No newline at end of file From 6a365530b7c2819ab57323370c333aaaf2d90e4f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20=C2=A9?= Date: Tue, 26 Oct 2021 09:33:33 +0100 Subject: [PATCH 04/11] add conda --- IceNet/polar-modelling-icenet.ipynb | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/IceNet/polar-modelling-icenet.ipynb b/IceNet/polar-modelling-icenet.ipynb index e2284df..8d2aa05 100644 --- a/IceNet/polar-modelling-icenet.ipynb +++ b/IceNet/polar-modelling-icenet.ipynb @@ -26,7 +26,7 @@ "Tom R. Andersson, British Antarctic Survey, [@tom-andersson](https://github.com/tom-andersson), 21/10/21 (latest revision)\n", "\n", "#### Version\n", - "The initial version of this notebook was generated through the Environmental AI book, see the commit [dbfb9cf](https://github.com/acocac/environmental-ai-book/commits/master/book/polar/modelling/polar-modelling-icenet.ipynb). The version was adapted to the Pangeo examples repo.\n", + "The initial version of this notebook was generated through the Environmental AI book, see the commit [2495c64](https://github.com/acocac/environmental-ai-book/commits/master/book/polar/modelling/polar-modelling-icenet.ipynb). The version was adapted to the Pangeo examples repo.\n", " \n", "### Modelling contributions\n", "#### Codebase\n", @@ -51,6 +51,18 @@ "collapsed": false } }, + { + "cell_type": "markdown", + "source": [ + "## Set the conda environment\n", + "\n", + "!conda env create -q -f environment.yml\n", + "!activate IceNet-repo" + ], + "metadata": { + "collapsed": false + } + }, { "cell_type": "markdown", "source": [ From 1c140e163045cd8291a3546e847c8df490ce1fcf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20=C2=A9?= Date: Wed, 27 Oct 2021 18:48:54 +0100 Subject: [PATCH 05/11] working version with the contour --- IceNet/polar-modelling-icenet.ipynb | 201 ++++++++++++---------------- 1 file changed, 87 insertions(+), 114 deletions(-) diff --git a/IceNet/polar-modelling-icenet.ipynb b/IceNet/polar-modelling-icenet.ipynb index 8d2aa05..d94ab80 100644 --- a/IceNet/polar-modelling-icenet.ipynb +++ b/IceNet/polar-modelling-icenet.ipynb @@ -86,55 +86,6 @@ } } }, - { - "cell_type": "markdown", - "source": [ - "## Set the virtual environment" - ], - "metadata": { - "collapsed": false - } - }, - { - "cell_type": "code", - "execution_count": null, - "outputs": [], - "source": [ - "# Import the required packages\n", - "import virtualenv\n", - "import pip\n", - "import os\n", - "\n", - "# Define and create the base directory install virtual environments\n", - "venvs_dir = os.path.join(os.path.expanduser(\"~\"), \"nb-venvs\")\n", - "\n", - "if not os.path.isdir(venvs_dir):\n", - " os.makedirs(venvs_dir)\n", - "\n", - "# Define the venv directory\n", - "venv_dir = os.path.join(venvs_dir, 'venv-icenet')\n", - "\n", - "if not os.path.exists(venv_dir):\n", - " # Create the virtual environment\n", - " virtualenv.create_environment(venv_dir)\n", - "\n", - " # Install a set of required packages via `pip`\n", - " requirements = ['matplotlib', 'urllib3', 'tqdm', 'xarray','tensorflow==2.2.0', 'hvplot','geoviews']\n", - "\n", - " for pkg in requirements:\n", - " pip.main([\"install\", \"--prefix\", venv_dir, pkg])\n", - "\n", - "# Activate the venv\n", - "activate_file = os.path.join(venv_dir, \"bin\", \"activate_this.py\")\n", - "exec(open(activate_file).read(), dict(__file__=activate_file))" - ], - "metadata": { - "collapsed": false, - "pycharm": { - "name": "#%%\n" - } - } - }, { "cell_type": "markdown", "source": [ @@ -257,13 +208,14 @@ "Let's download 3 out of 25 ensemble members [retrieved from the Polar Data Centre](https://doi.org/10.5285/71820e7d-c628-4e32-969f-464b7efb187c). The models are numbered from 36 to 60. For this example we use the networks 36, 42 and 53. It is worth to mention other pre-computed results from the Nature Communications paper can be downloaded including output results table, uncertainty, netCDF forecast of the 25 ensemble members, among others." ], "metadata": { - "collapsed": false + "collapsed": false, + "pycharm": { + "name": "#%% md\n" + } } }, { "cell_type": "code", - "execution_count": null, - "outputs": [], "source": [ "url = 'https://ramadda.data.bas.ac.uk/repository/entry/get/'\n", "\n", @@ -278,7 +230,9 @@ "pycharm": { "name": "#%%\n" } - } + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -688,7 +642,10 @@ "The IceNet codebase allows computing operations in the memory or with `dask`. The computation in dask is optimal for predicting longer target periods (see further info in [icenet/analyse_heldout_predictions.py](https://github.com/tom-andersson/icenet-paper/blob/27ca44694eaa3cb5f02fd824c618c46a6701a301/icenet/analyse_heldout_predictions.py#L23)). The following lines show how to compute in the memory." ], "metadata": { - "collapsed": false + "collapsed": false, + "pycharm": { + "name": "#%% md\n" + } } }, { @@ -702,8 +659,6 @@ }, { "cell_type": "code", - "execution_count": null, - "outputs": [], "source": [ "metric_compute_list = ['Binary accuracy', 'SIE error']\n", "\n", @@ -718,7 +673,9 @@ "pycharm": { "name": "#%%\n" } - } + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -830,48 +787,37 @@ { "cell_type": "markdown", "source": [ - "### Load forecasts and compute SIC" + "### Compute IceNet SIC" ], "metadata": { "collapsed": false } }, { - "cell_type": "code", - "execution_count": null, - "outputs": [], + "cell_type": "markdown", "source": [ - "target_forecasts = icenet_forecast_da.sel(time=all_target_dates)" + "We obtain the sea ice probability (SIC>15%) for each ensemble member and ensemble mean by summing IceNet’s marginal ice (15%80%) probabilities." ], "metadata": { "collapsed": false, "pycharm": { - "name": "#%%\n" + "name": "#%% md\n" } } }, - { - "cell_type": "markdown", - "source": [ - "We obtain the sea ice probability (SIC>15%) by summing IceNet’s marginal ice (15%80%) probabilities." - ], - "metadata": { - "collapsed": false - } - }, { "cell_type": "code", - "execution_count": null, - "outputs": [], "source": [ - "sip_da = target_forecasts.sel(ice_class=['marginal_ice', 'full_ice']).sum('ice_class')" + "icenet_sip_da = icenet_forecast_da.sel(ice_class=['marginal_ice', 'full_ice']).sum('ice_class')" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } - } + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -879,13 +825,14 @@ "### Ground truth SIC" ], "metadata": { - "collapsed": false + "collapsed": false, + "pycharm": { + "name": "#%% md\n" + } } }, { "cell_type": "code", - "execution_count": null, - "outputs": [], "source": [ "true_sic_fpath = os.path.join(config['obs_data_folder'], 'siconca_EASE.nc')\n", "true_sic_da = xr.open_dataarray(true_sic_fpath, chunks={})\n", @@ -900,7 +847,9 @@ "pycharm": { "name": "#%%\n" } - } + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -908,7 +857,10 @@ "### Monthwise masks" ], "metadata": { - "collapsed": false + "collapsed": false, + "pycharm": { + "name": "#%% md\n" + } } }, { @@ -956,15 +908,13 @@ }, { "cell_type": "code", - "execution_count": null, - "outputs": [], "source": [ "print('Analysing forecasts: \\n\\n')\n", "\n", "print('Computing metrics:')\n", "print(metric_compute_list)\n", "\n", - "binary_forecast_da = sip_da > 0.5\n", + "binary_forecast_da = icenet_sip_da > 0.5\n", "\n", "compute_ds = xr.Dataset()\n", "for metric in metric_compute_list:\n", @@ -1014,7 +964,9 @@ "pycharm": { "name": "#%%\n" } - } + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -1024,7 +976,10 @@ "In this section, we explore the forecast results and provide some interpretation. Note we use a sample data so the results are only for demonstration purposes." ], "metadata": { - "collapsed": false + "collapsed": false, + "pycharm": { + "name": "#%% md\n" + } } }, { @@ -1061,8 +1016,6 @@ }, { "cell_type": "code", - "execution_count": null, - "outputs": [], "source": [ "# Reset index to preprocess results dataset\n", "results_df = results_df.reset_index()\n", @@ -1086,7 +1039,9 @@ "pycharm": { "name": "#%%\n" } - } + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -1094,13 +1049,14 @@ "Let's inspect the results `pandas data.frame` reporting the monthly performance of each ensemble member for the target period." ], "metadata": { - "collapsed": false + "collapsed": false, + "pycharm": { + "name": "#%% md\n" + } } }, { "cell_type": "code", - "execution_count": null, - "outputs": [], "source": [ "results_df.head()" ], @@ -1109,7 +1065,9 @@ "pycharm": { "name": "#%%\n" } - } + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -1121,13 +1079,14 @@ "The dashboard (sliders + figure) is generated through the `panel` library, [an open-source Python library that lets you create custom interactive web apps and dashboards](https://panel.holoviz.org/index.html). In the settings below, we define two sliders which essentially allow us to interact with two variables, the month and lead time." ], "metadata": { - "collapsed": false + "collapsed": false, + "pycharm": { + "name": "#%% md\n" + } } }, { "cell_type": "code", - "execution_count": null, - "outputs": [], "source": [ "# set target year\n", "year = 2020\n", @@ -1135,16 +1094,18 @@ "# set sliders\n", "month_name = [f'{calendar.month_name[m]} {year}' for m in list(range(1, 13))]\n", "\n", - "month_slider = pn.widgets.DiscreteSlider(name=\"Month\", options=month_name, value='September 2020', direction='rtl' ,width=200)\n", + "month_slider = pn.widgets.DiscreteSlider(name=\"Month\", options=month_name, value='September 2020', width=200)\n", "\n", - "lead_slider = pn.widgets.IntSlider(name=\"Lead time (months)\", start=1, end=4, step=1, direction='rtl', value=4, width=200)" + "lead_slider = pn.widgets.IntSlider(name=\"Lead time (months)\", start=1, end=4, step=1, value=4, direction='rtl', width=200)" ], "metadata": { "collapsed": false, "pycharm": { "name": "#%%\n" } - } + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -1154,13 +1115,14 @@ "::::" ], "metadata": { - "collapsed": false + "collapsed": false, + "pycharm": { + "name": "#%% md\n" + } } }, { "cell_type": "code", - "execution_count": null, - "outputs": [], "source": [ "## set boundaries\n", "mask = np.load(os.path.join(config['mask_data_folder'],\n", @@ -1209,9 +1171,9 @@ " ax0.imshow(coastline_rgba_arr[top:bot, left:right, :], zorder=20)\n", " ax0.imshow(land_mask_rgba_arr[top:bot, left:right, :], zorder=1)\n", "\n", - " icenet_sip = icenet_forecast_da.sel(time=tdate, leadtime=leadtime).data\n", + " icenet_sip = icenet_sip_da.sel(time=tdate, leadtime=leadtime, seed='ensemble').data\n", " ax0.contour(\n", - " icenet_sip[0, top:bot, left:right, 0],\n", + " icenet_sip[top:bot, left:right],\n", " levels=[0.5],\n", " colors=[pred_ice_edge_rgb],\n", " zorder=1,\n", @@ -1267,7 +1229,9 @@ "pycharm": { "name": "#%%\n" } - } + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -1277,13 +1241,14 @@ "The figure below shows the mean binary accuracy versus lead time over the 12 forecasted dates for IceNet, SEAS5 and linear trend benchmark. We observe IceNet outperform SEAS5 and linear trend models at lead times of 2 months and beyond." ], "metadata": { - "collapsed": false + "collapsed": false, + "pycharm": { + "name": "#%% md\n" + } } }, { "cell_type": "code", - "execution_count": null, - "outputs": [], "source": [ "results_mean = results_df['Binary accuracy'].groupby(['Model','Ensemble member','Leadtime']).mean().reset_index()\n", "results_mean = results_mean[results_mean['Ensemble member'].isin(['NA','ensemble'])]\n", @@ -1301,7 +1266,9 @@ "pycharm": { "name": "#%%\n" } - } + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -1313,13 +1280,14 @@ "We also observe SEAS5 outperforms IceNet at a 1-month lead time over time, except in October. According to the Nature communications paper, this is likely because IceNet only receives monthly averages as input, smearing the weather phenomena and initial conditions that dominate predictability on such short timescales." ], "metadata": { - "collapsed": false + "collapsed": false, + "pycharm": { + "name": "#%% md\n" + } } }, { "cell_type": "code", - "execution_count": null, - "outputs": [], "source": [ "lead_slider = pn.widgets.IntSlider(name=\"Lead time (months)\", start=1, end=4, step=1, width=150)\n", "\n", @@ -1355,7 +1323,9 @@ "pycharm": { "name": "#%%\n" } - } + }, + "execution_count": null, + "outputs": [] }, { "cell_type": "markdown", @@ -1372,7 +1342,10 @@ "further information of other key aspects e.g. variable importance, model calibration, etc. which for sake of simplicity are not covered in the demonstrator." ], "metadata": { - "collapsed": false + "collapsed": false, + "pycharm": { + "name": "#%% md\n" + } } } ], From 024c50f17564ee92d6a801ee8682b306dac38f72 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20=C2=A9?= Date: Thu, 28 Oct 2021 15:02:23 +0100 Subject: [PATCH 06/11] add bash to set conda env --- IceNet/env.sh | 52 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) create mode 100644 IceNet/env.sh diff --git a/IceNet/env.sh b/IceNet/env.sh new file mode 100644 index 0000000..702cf50 --- /dev/null +++ b/IceNet/env.sh @@ -0,0 +1,52 @@ +#! /usr/bin/env bash + +ENV_NAME=$1 +ENV_SPEC=$2 + +function create_env { + echo "creating $ENV_NAME from $ENV_SPEC" + conda env create -n $ENV_NAME + conda env update -n $ENV_NAME --file $ENV_SPEC + eval "$(conda shell.bash hook)" && conda activate $ENV_NAME + conda install ipykernel -y + python -m ipykernel install --user --name $ENV_NAME --display-name $ENV_NAME + jupyter nbextension enable --py widgetsnbextension +} + +# Code below mostly from stackoverflow +# https://stackoverflow.com/questions/60115420/check-for-existing-conda-environment-in-makefile + +RED='\033[1;31m' +GREEN='\033[1;32m' +CYAN='\033[1;36m' +NC='\033[0m' # No Color + +if ! (return 0 2>/dev/null) ; then + # If return is used in the top-level scope of a non-sourced script, + # an error message is emitted, and the exit code is set to 1 + echo + echo -e $RED"This script should be sourced like"$NC + echo " . ./activate.sh" + echo + exit 1 # we detected we are NOT source'd so we can use exit +fi + +if type conda 2>/dev/null; then + if conda info --envs | grep ${ENV_NAME}; then + echo -e $CYAN"activating environment ${ENV_NAME}"$NC + else + echo + echo -e $RED"(!) Will install the conda environment ${ENV_NAME}"$NC + echo + create_env + return 1 # we are source'd so we cannot use exit + fi +else + echo + echo -e $RED"(!) Please install anaconda"$NC + echo + return 1 # we are source'd so we cannot use exit +fi + +eval "$(conda shell.bash hook)" && conda activate $ENV_NAME +echo -e $RED"Change kernel to $ENV_NAME, refresh browser if not available." \ No newline at end of file From da42efab2ee99746910bc00b9f2e7ec1debbd804 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20=C2=A9?= Date: Thu, 28 Oct 2021 15:02:51 +0100 Subject: [PATCH 07/11] remove env name --- IceNet/environment.yml | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/IceNet/environment.yml b/IceNet/environment.yml index 2958e56..22fff20 100644 --- a/IceNet/environment.yml +++ b/IceNet/environment.yml @@ -1,4 +1,3 @@ -name: IceNet channels: - conda-forge dependencies: @@ -8,5 +7,8 @@ dependencies: - hvplot - tensorflow==2.2.0 - geoviews - - iris==3.0.1 - - urllib3 \ No newline at end of file + - iris + - urllib3 + - imageio + - ipywidgets + - tqdm \ No newline at end of file From e02291bea8b21ee967d4e214bc4779a1732e7052 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20=C2=A9?= Date: Thu, 28 Oct 2021 15:03:41 +0100 Subject: [PATCH 08/11] add conda env setting and force mask creation in the new environment --- IceNet/polar-modelling-icenet.ipynb | 45 ++++++++++++++++++++++++----- 1 file changed, 38 insertions(+), 7 deletions(-) diff --git a/IceNet/polar-modelling-icenet.ipynb b/IceNet/polar-modelling-icenet.ipynb index d94ab80..f3a7b9f 100644 --- a/IceNet/polar-modelling-icenet.ipynb +++ b/IceNet/polar-modelling-icenet.ipynb @@ -7,7 +7,7 @@ "\n", "## Context\n", "### Purpose\n", - "Demonstrate IceNet, a deep learning sea ice forecasting system trained using climate simulations and observational data.\n", + "Demonstrate IceNet, a deep learninßg sea ice forecasting system trained using climate simulations and observational data.\n", "\n", "### Modelling approach\n", "**IceNet** is a probabilistic, deep learning sea ice forecasting system. The model, an ensemble of U-Net networks, learns how sea ice changes from climate simulations and observational data to forecast up to 6 months of monthly-averaged sea ice concentration maps at 25 km resolution. IceNet advances the range of accurate sea ice forecasts, outperforming a state-of-the-art dynamical model in seasonal forecasts of summer sea ice, particularly for extreme sea ice events. IceNet was implemented in Python 3.7 using TensorFlow v2.2.0. Further details can be found in the Nature Communications paper [*Seasonal Arctic sea ice forecasting with probabilistic deep learning*](https://www.nature.com/articles/s41467-021-25257-4).\n", @@ -54,15 +54,46 @@ { "cell_type": "markdown", "source": [ - "## Set the conda environment\n", - "\n", - "!conda env create -q -f environment.yml\n", - "!activate IceNet-repo" + "## Set the conda environment" ], "metadata": { "collapsed": false } }, + { + "cell_type": "code", + "source": [ + "# Settings\n", + "env_name = \"IceNet-dep\"" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "code", + "execution_count": null, + "outputs": [], + "source": [ + "# Use this cell if the conda environment is not already set up\n", + "# You will then be able to select the env as a kernel in the jupyter notebook.\n", + "# This is controlled mainly by environment.yml,\n", + "# but env.sh installs the kernel for the jupyter notebook.\n", + "# You will probably not need to change env.sh.\n", + "!. env.sh {env_name} environment.yml\n" + ], + "metadata": { + "collapsed": false, + "pycharm": { + "name": "#%%\n" + } + } + }, { "cell_type": "markdown", "source": [ @@ -136,7 +167,7 @@ "# utils\n", "import urllib.request\n", "import re\n", - "from tqdm.notebook import tqdm\n", + "from tqdm import tqdm\n", "import calendar\n", "from pprint import pprint\n", "\n", @@ -313,7 +344,7 @@ "execution_count": null, "outputs": [], "source": [ - "!python polar-modelling-icenet/icenet/gen_masks.py" + "!/anaconda/envs/IceNet-dep/bin/python polar-modelling-icenet/icenet/gen_masks.py" ], "metadata": { "collapsed": false, From f79d9aa7f743d84fbfd8ecfb1b7b63d1feb851c2 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20=C2=A9?= Date: Thu, 28 Oct 2021 15:04:23 +0100 Subject: [PATCH 09/11] env name IceNet-repo --- IceNet/polar-modelling-icenet.ipynb | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/IceNet/polar-modelling-icenet.ipynb b/IceNet/polar-modelling-icenet.ipynb index f3a7b9f..b23c015 100644 --- a/IceNet/polar-modelling-icenet.ipynb +++ b/IceNet/polar-modelling-icenet.ipynb @@ -64,7 +64,7 @@ "cell_type": "code", "source": [ "# Settings\n", - "env_name = \"IceNet-dep\"" + "env_name = \"IceNet-repo\"" ], "metadata": { "collapsed": false, @@ -344,7 +344,7 @@ "execution_count": null, "outputs": [], "source": [ - "!/anaconda/envs/IceNet-dep/bin/python polar-modelling-icenet/icenet/gen_masks.py" + "!/anaconda/envs/IceNet-repo/bin/python polar-modelling-icenet/icenet/gen_masks.py" ], "metadata": { "collapsed": false, From d0b1b8baa31c00de0739d658bf6ebf1125f99c4e Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20=C2=A9?= Date: Wed, 10 Nov 2021 10:50:28 +0000 Subject: [PATCH 10/11] move some dependencies to install via pip --- IceNet/environment.yml | 24 ++++++++++++++---------- 1 file changed, 14 insertions(+), 10 deletions(-) diff --git a/IceNet/environment.yml b/IceNet/environment.yml index 22fff20..d7e98a4 100644 --- a/IceNet/environment.yml +++ b/IceNet/environment.yml @@ -2,13 +2,17 @@ channels: - conda-forge dependencies: - python=3.8 - - matplotlib - - numpy - - hvplot - - tensorflow==2.2.0 - - geoviews - - iris - - urllib3 - - imageio - - ipywidgets - - tqdm \ No newline at end of file + - iris==3.0.1 + - pip + - pip: + - matplotlib + - numpy + - hvplot + - tensorflow==2.2.0 + - geoviews + - urllib3 + - imageio + - ipywidgets + - pandas + - tqdm + - xarray \ No newline at end of file From cea3f56eabbd013d83a4d49b315b42bd8aa3c2bd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Alejandro=20=C2=A9?= Date: Wed, 10 Nov 2021 10:51:07 +0000 Subject: [PATCH 11/11] add Nick as reviewer, fix typos, more details of ground truth data --- IceNet/polar-modelling-icenet.ipynb | 67 +++++++++++++++++------------ 1 file changed, 40 insertions(+), 27 deletions(-) diff --git a/IceNet/polar-modelling-icenet.ipynb b/IceNet/polar-modelling-icenet.ipynb index b23c015..18c4582 100644 --- a/IceNet/polar-modelling-icenet.ipynb +++ b/IceNet/polar-modelling-icenet.ipynb @@ -7,7 +7,7 @@ "\n", "## Context\n", "### Purpose\n", - "Demonstrate IceNet, a deep learninßg sea ice forecasting system trained using climate simulations and observational data.\n", + "Demonstrate IceNet, a deep learning sea ice forecasting system trained using climate simulations and observational data.\n", "\n", "### Modelling approach\n", "**IceNet** is a probabilistic, deep learning sea ice forecasting system. The model, an ensemble of U-Net networks, learns how sea ice changes from climate simulations and observational data to forecast up to 6 months of monthly-averaged sea ice concentration maps at 25 km resolution. IceNet advances the range of accurate sea ice forecasts, outperforming a state-of-the-art dynamical model in seasonal forecasts of summer sea ice, particularly for extreme sea ice events. IceNet was implemented in Python 3.7 using TensorFlow v2.2.0. Further details can be found in the Nature Communications paper [*Seasonal Arctic sea ice forecasting with probabilistic deep learning*](https://www.nature.com/articles/s41467-021-25257-4).\n", @@ -18,29 +18,22 @@ "* Visualise IceNet’s seasonal ice edge predictions at 4- to 1-month lead times.\n", "* Interactive plots comparing IceNet predictions against ECMWF SEAS5 physics-based sea ice concentration and a linear trend statistical benchmark.\n", "\n", - "### Notebook contributions\n", - "#### Author\n", - "Alejandro Coca-Castro, The Alan Turing Institute, [@acocac](https://github.com/acocac)\n", + "### Contributions\n", "\n", - "#### Reviewers\n", - "Tom R. Andersson, British Antarctic Survey, [@tom-andersson](https://github.com/tom-andersson), 21/10/21 (latest revision)\n", + "#### Notebook\n", + "* Alejandro Coca-Castro (author), The Alan Turing Institute, [@acocac](https://github.com/acocac)\n", + "* Tom R. Andersson (reviewer), British Antarctic Survey, [@tom-andersson](https://github.com/tom-andersson), 26/10/21 (latest revision)\n", + "* Nick Barlow (reviewer), The Alan Turing Institute, [@nbarlowATI](https://github.com/nbarlowATI), 04/11/21 (latest revision)\n", "\n", - "#### Version\n", - "The initial version of this notebook was generated through the Environmental AI book, see the commit [2495c64](https://github.com/acocac/environmental-ai-book/commits/master/book/polar/modelling/polar-modelling-icenet.ipynb). The version was adapted to the Pangeo examples repo.\n", - " \n", - "### Modelling contributions\n", - "#### Codebase\n", - "- Tom R. Andersson (author), British Antarctic Survey, [@tom-andersson](https://github.com/tom-andersson)\n", - "- James Byrne (contributor), British Antarctic Survey, [@JimCircadian](https://github.com/JimCircadian)\n", - "- Tony Phillips (contributor), British Antarctic Survey\n", + "#### Modelling codebase\n", + "* Tom R. Andersson (author), British Antarctic Survey, [@tom-andersson](https://github.com/tom-andersson)\n", + "* James Byrne (contributor), British Antarctic Survey, [@JimCircadian](https://github.com/JimCircadian)\n", + "* Tony Phillips (contributor), British Antarctic Survey\n", "\n", - "#### Paper\n", - "Tom R. Andersson, J. Scott Hosking, María Pérez-Ortiz, Brooks Paige, Andrew Elliott, Chris Russell, Stephen Law, Daniel C. Jones, Jeremy Wilkinson, Tony Phillips, James Byrne, Steffen Tietsche, Beena Balan Sarojini, Eduardo Blanchard-Wrigglesworth, Yevgeny Aksenov, Rod Downie & Emily Shuckburgh. See [here](https://www.nature.com/articles/s41467-021-25257-4#author-information) further author information (affiliations and contributions).\n", + "#### Modelling publications\n", + "* Tom R Andersson, J Scott Hosking, María Pérez-Ortiz, Brooks Paige, Andrew Elliott, Chris Russell, Stephen Law, Daniel C Jones, Jeremy Wilkinson, Tony Phillips, James Byrne, Steffen Tietsche, Beena Balan Sarojini, Eduardo Blanchard-Wrigglesworth, Yevgeny Aksenov, Rod Downie, and Emily Shuckburgh. Seasonal arctic sea ice forecasting with probabilistic deep learning. Nature Communications, 12:5124, 2021. URL: [https://doi.org/10.1038/s41467-021-25257-4](https://doi.org/10.1038/s41467-021-25257-4).\n", "\n", - "#### Version\n", - "The version explored of the IceNet codebase is 1.0.0 commit [9d69ad7](https://github.com/tom-andersson/icenet-paper/compare/v1.0.0...main)\n", - "\n", - "### Funding\n", + "#### Modelling funding\n", "The IceNet project was supported by Wave 1 of The UKRI Strategic Priorities Fund under the EPSRC Grant EP/T001569/1, particularly the AI for Science’ theme within that grant and The Alan Turing Institute.\n", "\n", ":::{note}\n", @@ -225,7 +218,7 @@ "source": [ "## Download input data and models\n", "\n", - "IceNet consists of 25 ensemble members i.e. models. For this demonstrator, we only download three of them to reduce computational cost (note that this will reduce performance compared with the full ensemble). We also fetch analysis-ready i.e. preprocessed data of climate observations, ground thruth sea ice concentration (SIC) and a IceNet's project configuration file from a Zenodo repository. Finally, we call a script from the IceNet paper repo to generate masks required for computing metrics and visualisation." + "IceNet consists of 25 ensemble members i.e. models. For this demonstrator, we only download three of them to reduce computational cost (note that this will reduce performance compared with the full ensemble). We also fetch analysis-ready i.e. preprocessed data of climate observations, ground truth sea ice concentration (SIC) and a IceNet's project configuration file from a Zenodo repository. Finally, we call a script from the IceNet paper repo to generate masks required for computing metrics and visualisation." ], "metadata": { "collapsed": false @@ -302,7 +295,11 @@ { "cell_type": "markdown", "source": [ - "### Download ground truth SIC" + "### Download ground truth SIC\n", + "\n", + ":::{note}\n", + "The analysis-ready ground truth SIC data were generated by running the script python3 icenet/download_sic_data.py in step **2) Download data** according to the [icenet-paper repository](https://github.com/tom-andersson/icenet-paper). The script downloads and concatenate [OSI-SAF SIC data](https://osisaf-hl.met.no/v2p1-sea-ice-index), OSI-450 (1979-2015) and OSI-430-b (2016-ownards), and saves it as monthly averages in a netCDF file.\n", + ":::" ], "metadata": { "collapsed": false @@ -333,7 +330,11 @@ "\n", "The script `icenet/gen_masks.py` generates masks for land, the polar holes, OSI-SAF monthly maximum ice extent (the *active\n", "grid cell region*), and the Arctic regions & coastline. Figures of the\n", - "masks are saved in the **./figures** folder." + "masks are saved in the **./figures** folder.\n", + "\n", + ":::{note}\n", + "The python route after the exclamation in the line below works on the AzureML VM. The route should be changed in other systems (local or remote).\n", + ":::" ], "metadata": { "collapsed": false @@ -390,7 +391,7 @@ { "cell_type": "markdown", "source": [ - "The `input_data` element of the IceNet's `JSON` file lists input variables and corresponding settings. We use the same input data of Nature Communications' paper which consists of SIC, 11 climate variables, statistical SIC forecasts, and metadata (see [Supplementary Table 2](https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-021-25257-4/MediaObjects/41467_2021_25257_MOESM1_ESM.pdf)). These layers are stacked in an identical manner to the RGB channels of a traditional image, amounting to 50 channels in total." + "The `input_data` element of the IceNet's `JSON` file lists input variables and corresponding settings. We use the same input data in the Nature Communications' paper which consists of SIC, 11 climate variables, statistical SIC forecasts, and metadata (see [Supplementary Table 2](https://static-content.springer.com/esm/art%3A10.1038%2Fs41467-021-25257-4/MediaObjects/41467_2021_25257_MOESM1_ESM.pdf)). These layers are stacked in an identical manner to the RGB channels of a traditional image, amounting to 50 channels in total." ], "metadata": { "collapsed": false @@ -542,7 +543,7 @@ "source": [ "### Set up forecast DataArray dictionary\n", "\n", - "Now we are setting up an empty `xarray DataArray` object that we will use to store IceNet's forecasts. `DataArrays` let you conveniently handle, query and visualise spatio-temporal data as the forecast predictions generated by the IceNet system." + "Now we are setting up an empty `xarray DataArray` object that we will use to store IceNet's forecasts. `DataArrays` let you conveniently handle, query and visualise spatio-temporal data, such as the forecast predictions generated by the IceNet system." ], "metadata": { "collapsed": false @@ -1004,7 +1005,7 @@ "source": [ "## Analysis\n", "\n", - "In this section, we explore the forecast results and provide some interpretation. Note we use a sample data so the results are only for demonstration purposes." + "In this section, we explore the forecast results and provide some interpretation. Note we use a small sample of the data so the results are only for demonstration purposes." ], "metadata": { "collapsed": false, @@ -1142,7 +1143,7 @@ "cell_type": "markdown", "source": [ "::::{important}\n", - "The interactive figure below essentially reproduces [Figure 2](https://www.nature.com/articles/s41467-021-25257-4/figures/2) of the IceNet paper, however it covers a larger geographical extent i.e. in March when the ice edge extent is largest. Also, we visualise each month of the target period of this demonstrator (January to December 2020). Some script snippets were extracted from the IceNet script `python3 icenet/plot_paper_figures.py` (see [line 182](https://github.com/tom-andersson/icenet-paper/blob/main/icenet/plot_paper_figures.py)). Note we define alpha and colours for coastline and land mask object. These configurations allow overlapping these layers correctly to differenciate IceNet predictions and SIC ground thruth.\n", + "The interactive figure below essentially reproduces [Figure 2](https://www.nature.com/articles/s41467-021-25257-4/figures/2) of the IceNet paper, however it covers a larger geographical extent i.e. in March when the ice edge extent is largest. Also, we visualise each month of the target period of this demonstrator (January to December 2020). Some script snippets were extracted from the IceNet script `python3 icenet/plot_paper_figures.py` (see [line 182](https://github.com/tom-andersson/icenet-paper/blob/main/icenet/plot_paper_figures.py)). Note we define alpha and colours for coastline and land mask object. These configurations allow overlapping these layers correctly to differentiate IceNet predictions and SIC ground truth.\n", "::::" ], "metadata": { @@ -1378,6 +1379,18 @@ "name": "#%% md\n" } } + }, + { + "cell_type": "markdown", + "source": [ + "## Version\n", + "\n", + "* Notebook: commit [9e1fb10](https://github.com/acocac/environmental-ai-book/commits/master/book/polar/modelling/polar-modelling-icenet.ipynb)\n", + "* Codebase: 1.0.0 with commit [9d69ad7](https://github.com/tom-andersson/icenet-paper/compare/v1.0.0...main)" + ], + "metadata": { + "collapsed": false + } } ], "metadata": {