Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IceNet demonstrator #6

Merged
merged 18 commits into from
Oct 24, 2021
Merged

IceNet demonstrator #6

merged 18 commits into from
Oct 24, 2021

Conversation

acocac
Copy link
Member

@acocac acocac commented Sep 20, 2021

A PR for contributing a IceNet Sea Ice Forecasting demonstrator to the Environmental AI book. The notebook:

  • Clone and access the IceNet codebase to produce seasonal Artic sea ice forecast using 3 out of 25 five pre-trained IceNet models.
  • Forecast a single year, 2020, from Apr 2016 to Dec 2020 ERA5 observations.
  • Assess IceNet’s ice edge predictions for September forecasts at 4- to 1-month lead times.
  • Compare IceNet predictions against ECMWF SEAS5 physics-based sea ice probability forecast and linear trend statistical benchmark.

Notes:
The notebook maintains the project structure of the IceNet paper repository. The mask data are generated by gen_masks.py script. The remaining sample data and configuration file are downloaded from a public zenodo repository.
I have added Tensorflow 2.2.0 to the environment.yml file.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@acocac acocac added the modelling Modelling Notebooks label Sep 21, 2021
@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:10Z
----------------------------------------------------------------

  • The word 'excels' is not correct in IceNet excels the range of accurate sea ice forecasts. You could use 'advances' (as we do in the paper abstract) or a synonym!
  • It should be 'to produce seasonal Arctic sea ice forecasts'
  • It is unclear why you use ERA5 data going back to April 2016 if only forecasting 2020
  • Perhaps mention in 'Highlights' that you download sample data from a Zenodo repository?
  • Can you frame this as a 'demonstrator of the IceNet forecasting system' or similar
  • Tell people that at the end of the notebook there will be some interactive visualisations, so people should 'make sure to get to the end!' or something :-)

Aside from my minor comments, my main comment at this point in the review is that the notebook is super long. Personally, I prefer shorter notebooks that have a good amount of markdown text and the code cells are not too long. The notebook has become this long because of copying code directly from my codebase, which has additional content necessary for making the scripts generic to various model/computations. But in this case we should exploit two things to reduce length: 1) we know we only want to forecast/analyse with IceNet and what metrics we want, 2) we know the notebook will only be run once, linearly from start to finish.

People lose attention easily so let's try make it as short as we can!


acocac commented on 2021-09-24T16:00:08Z
----------------------------------------------------------------

The word 'excels' is not correct in IceNet excels the range of accurate sea ice forecasts. You could use 'advances' (as we do in the paper abstract) or a synonym!

Done. IceNet advances...

It should be 'to produce seasonal Arctic sea ice forecasts'

Done.

It is unclear why you use ERA5 data going back to April 2016 if only forecasting 2020 & Perhaps mention in 'Highlights' that you download sample data from a Zenodo repository?

Done. * Forecast a single year, 2020, from Apr 2019 to Dec 2020 analysis-ready data dowloaded from a Zenodo repository.

Can you frame this as a 'demonstrator of the IceNet forecasting system' or similar

Done. The purpose now frames it: Demonstrate IceNet, a deep learning sea ice forecasting system trained using climate simulations and observational data.

Tell people that at the end of the notebook there will be some interactive visualisations, so people should 'make sure to get to the end!' or something :-)

The last two items of the highlights:

* Visualise IceNet’s seasonal ice edge predictions at 4- to 1-month lead times.

* Interactive plots comparing IceNet predictions against ECMWF SEAS5 physics-based sea ice probability forecast and linear trend statistical benchmark.

tom-andersson commented on 2021-10-19T15:56:19Z
----------------------------------------------------------------

  • The word 'Polar:' in the title reads strangely to me - I understand it is in the Polar section of the Environmental AI book, but perhaps this could be removed if it is clear from the book subsections?
  • 'Forecast a single year, 2020, from Apr 2019 to Dec 2020' confused me initially, perhaps: 'Forecast a single year, 2020, using IceNet's preprocessed environmental input data downloaded from a Zenodo repository.'
  • 'ECMWF SEAS5 physics-based sea ice probability' should be 'ECMWF SEAS5 physics-based sea ice concentration and a linear trend statistical benchmark'

acocac commented on 2021-10-20T19:21:37Z
----------------------------------------------------------------

  • Thanks for the point about the section name in the title. The new version includes sort of tags (badges) indicating the primary (environment) and secondary (modelling, sensor) sections. I'll work in other levels of tags available in sphinx-panels in future versions of the EnvAI book.
  • Changed.
  • Changed.

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:11Z
----------------------------------------------------------------

Is there a way to remove the blank bokeh(?) outputs below this import cell?


acocac commented on 2021-09-24T10:35:03Z
----------------------------------------------------------------

I am not sure if they can be removed. Why do you suggest to remove it?

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:12Z
----------------------------------------------------------------

Line #5.    network_exist = [f for f in target_networks if os.path.isfile(f'./networks/network_tempscaled_{f}.h5')];

f is the network number, so perhaps you could use a more informative variable name than f which typically means 'filename'


acocac commented on 2021-09-24T16:14:19Z
----------------------------------------------------------------

removed to make the cell shorter...

url = 'https://ramadda.data.bas.ac.uk/repository/entry/get/'

target_networks = [36, 42, 53]

for network in target_networks:
urllib.request.urlretrieve(url + f'network_tempscaled_{network}.h5?entryid=synth%3A71820e7d-c628-4e32-969f-464b7efb187c%3AL25ldXJhbF9uZXR3b3JrX21vZGVsL25ldHdvcmtfdGVtcHNjYWxlZF8zNi5oNQ%3D%3D', f'./networks/network_tempscaled_{network}.h5')

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:12Z
----------------------------------------------------------------

Line #11.    if len(network_non_exist) > 0:

This reads strangely... If there are 1 or more networks that don't exist, do the download? Is there a variable name typo?


acocac commented on 2021-09-24T10:44:12Z
----------------------------------------------------------------

Make the step simple by removing network_exist and network_non_exist vars

url = 'https://ramadda.data.bas.ac.uk/repository/entry/get/'

target_networks = [36, 42, 53]

if not os.path.exists(config['network_h5_files_folder']):
os.makedirs(config['network_h5_files_folder'])

for target in target_networks:
urllib.request.urlretrieve(url + f'network_tempscaled_{target}.h5?entryid=synth%3A71820e7d-c628-4e32-969f-464b7efb187c%3AL25ldXJhbF9uZXR3b3JrX21vZGVsL25ldHdvcmtfdGVtcHNjYWxlZF8zNi5oNQ%3D%3D', f'./networks/network_tempscaled_{target}.h5')

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:13Z
----------------------------------------------------------------

Should be 'ERA5' in title


acocac commented on 2021-09-24T10:44:31Z
----------------------------------------------------------------

done, thanks!

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:14Z
----------------------------------------------------------------

Line #1.    # very slow > plan B: to provide the analysis-ready file, siconca_EASE.nc

Can delete comment now :-)


acocac commented on 2021-09-24T12:25:07Z
----------------------------------------------------------------

done

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:14Z
----------------------------------------------------------------

perhaps 'data loader configuration file'?


acocac commented on 2021-09-24T12:25:44Z
----------------------------------------------------------------

change to

Download data loader configuration file

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:15Z
----------------------------------------------------------------

Perhaps you could load the file with json and print a snippet? But no worries if it prints too much stuff.

e.g.

open(dataloader_config_fpath, 'r') as readfile:
    dataloader_config = json.load(readfile)

from pprint import pprint
pprint(dataloader_config['input_data'])



acocac commented on 2021-10-18T10:28:27Z
----------------------------------------------------------------

done

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:15Z
----------------------------------------------------------------

Line #9.    ensemble_seeds_and_mean = ensemble_seeds.copy()

Delete L9 and L10 and move to DataArray setup cell below where it is used?


acocac commented on 2021-09-24T12:44:37Z
----------------------------------------------------------------

done

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:16Z
----------------------------------------------------------------

  1. Can we briefly explain the benefits of using xarray here? Can you state that we are setting up an empty xarray.DataArray object that we will use to store the forecasts in with informative coordinate labels? E.g. "Now we are setting up an empty xarray DataArray object that we will use to store IceNet's forecasts. DataArrays let you conveniently..."
  2. I would change heldout with forecast for ease of reading. 'Heldout' makes more sense in the context of a validation set for training the model.

acocac commented on 2021-10-18T10:41:43Z
----------------------------------------------------------------

  1. The description was extended to explain the benefits of using xarray. Now we are setting up an empty xarray DataArray object that we will use to store IceNet's forecasts. DataArrays let you conveniently handle, query and visualise spatio-temporal data as the forecast predictions generated by the IceNet system.
  2. Changed heldout with forecast.

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:17Z
----------------------------------------------------------------

Line #5.    

Add

ensemble_seeds_and_mean = ensemble_seeds.copy()

ensemble_seeds_and_mean.append('ensemble')



@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:17Z
----------------------------------------------------------------

Line #35.    coords['seed'] = ensemble_seeds_and_mean

Add these bits to the lines above where the coords dictionary is instantiated


acocac commented on 2021-10-18T13:58:33Z
----------------------------------------------------------------

done!

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:18Z
----------------------------------------------------------------

Line #37.    shape = (len(ensemble_seeds_and_mean), *shape, 3)

Replace *shape with the raw tuple above (and delete the first shape = lines)


acocac commented on 2021-09-24T12:48:06Z
----------------------------------------------------------------

which raw tuple?

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:18Z
----------------------------------------------------------------

Line #39.    model_forecast = xr.DataArray(

Would it be good to print(model_forecast) ?


@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:19Z
----------------------------------------------------------------

Can you explain how the ensemble mean prediction of the three chosen ensemble members is computed by averaging the outputs? Can you state how the predictions of P(SIC<15%), P(15%<SIC<80%), P(SIC>80%) is converted to the 'sea ice probability', P(SIC>15%), by summing the latter two classes?


@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:19Z
----------------------------------------------------------------

Line #1.    start_date = all_start_dates[0]

Remove


acocac commented on 2021-09-24T12:53:49Z
----------------------------------------------------------------

done

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:20Z
----------------------------------------------------------------

Line #44.    del(model_forecast) ## remove model_forecast from the memory

Apparently we should call gc.collect() after this statement to run the garbage collector: https://stackoverflow.com/questions/1316767/how-can-i-explicitly-free-memory-in-python


acocac commented on 2021-10-18T14:00:43Z
----------------------------------------------------------------

it doesn't seem to be longer necessary as we're doing a small computation.

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:21Z
----------------------------------------------------------------

Finish markdown cell - explain what metrics we are computing to analyse forecast performance


acocac commented on 2021-10-18T14:19:31Z
----------------------------------------------------------------

Added:

To analyse the forecast performance, IceNet's researchers compute two metrics, Binary accuracy and Sea Ice Extent (SIE) error. The former is generated over an active grid cell region for a given calendar month and can be seen as a normalised version of the integrated ice edge error (IIEE) (see further information of the meaning in Methods in the IceNet's Nature communications paper. The latter, SIE error, is the difference between the overpredicted area and the underpredicted area. Both metrics are complementary, being the binary accuracy more robust for assessing IceNet’s relative seasonal forecast skill for September.

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:21Z
----------------------------------------------------------------

Line #11.    icenet_ID = 'unet_tempscale'

Sorry I think this should be architecture_ID - there is a typo in my analyse_ script. I use icenet_ID = 'IceNet{}{}'.format(dataloader_ID, architecture_ID). This lets you uniquely refer to a certain data/architecture combination.


acocac commented on 2021-10-18T14:20:59Z
----------------------------------------------------------------

the new notebook contains a simpler notation for the model, IceNet

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:22Z
----------------------------------------------------------------

Can you quickly explain how dask chunking works here or in a code comment?


acocac commented on 2021-10-18T14:21:23Z
----------------------------------------------------------------

Not longer necessary as the computations are in the memory.

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:22Z
----------------------------------------------------------------

Line #11.    if icenet_ID in model_compute_list:

I guess we don't need this since the point of the notebook is to analyse IceNet forecasts?


acocac commented on 2021-09-24T13:14:17Z
----------------------------------------------------------------

removed

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:23Z
----------------------------------------------------------------

Needs a markdown cell for more context about how we are downloading all the pre-computed results from the Nature Communications paper from the Polar Data Centre.


acocac commented on 2021-10-18T14:24:37Z
----------------------------------------------------------------

Added:

It is worth to mention other pre-computed results from the Nature Communications paper can be downloaded including output results table, uncertainty, netCDF forecast of the 25 ensemble members, among others. 

@review-notebook-app
Copy link

review-notebook-app bot commented Sep 23, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-09-23T16:16:23Z
----------------------------------------------------------------

This cell is very long...


acocac commented on 2021-10-18T14:25:46Z
----------------------------------------------------------------

the cell is considerably shorter in the new version.

now = pd.Timestamp.now()
new_results_df_fname = now.strftime('%Y_%m_%d_%H%M%S_forecast_results.csv')
new_results_df_fpath = os.path.join(config['forecast_results_folder'], new_results_df_fname)

print('New results will be saved to {}\n\n'.format(new_results_df_fpath))

results_df_fnames = sorted([f for f in os.listdir(config['forecast_results_folder']) if re.compile('.*.csv').match(f)])
if len(results_df_fnames) >= 1:
  old_results_df_fname = results_df_fnames[-1]
  old_results_df_fpath = os.path.join(config['forecast_results_folder'], old_results_df_fname)
  print('\n\nLoading previous results dataset from {}'.format(old_results_df_fpath))

Load previous results, do not interpret 'NA' as NaN

results_df = pd.read_csv(old_results_df_fpath, keep_default_na=False, comment='#')

Remove existing IceNet results

results_df = results_df[~results_df['Model'].str.startswith('IceNet')]

Drop spurious index column if present

results_df = results_df.drop('Unnamed: 0', axis=1, errors='ignore')
results_df['Forecast date'] = [pd.Timestamp(date) for date in results_df['Forecast date']]

results_df = results_df.set_index(['Model', 'Ensemble member', 'Leadtime', 'Forecast date'])

Add new models to the dataframe

multi_index = create_results_dataset_index([model], leadtimes, all_target_dates, model, icenet_seeds)
results_df = results_df.append(pd.DataFrame(index=multi_index)).sort_index()

@tom-andersson
Copy link

Thanks for all this @acocac! notebooksharing.space is a nice option to view the plots, although unfortunately the slides do not work. I'll stick to using the .zip files you sent.

I'm taking a look at ReviewNB now.

Copy link

  • The word 'Polar:' in the title reads strangely to me - I understand it is in the Polar section of the Environmental AI book, but perhaps this could be removed if it is clear from the book subsections?
  • 'Forecast a single year, 2020, from Apr 2019 to Dec 2020' confused me initially, perhaps: 'Forecast a single year, 2020, using IceNet's preprocessed environmental input data downloaded from a Zenodo repository.'
  • 'ECMWF SEAS5 physics-based sea ice probability' should be 'ECMWF SEAS5 physics-based sea ice concentration and a linear trend statistical benchmark'

View entire conversation on ReviewNB

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-10-19T16:14:47Z
----------------------------------------------------------------

  • Maybe: For this demonstrator, we only download three of them to reduce computational cost (note that this will reduce performance compared with the full ensemble).
  • Should be: We call a script from the IceNet paper repo

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-10-19T16:14:48Z
----------------------------------------------------------------

  • For some reason the contour lines appear faint, could you increase the line width and the alpha? Or is alpha already 1 by default? It might just be the colours..
  • Can a legend be added?
  • I think the axis size is slightly not square, it looks stretched horizontally. Is there a way to make the aspect ratio equal?

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-10-19T16:14:48Z
----------------------------------------------------------------

Line #6.    month_slider = pn.widgets.DiscreteSlider(name="Month", options=month_name, value='September')

If would be cool if you could make the slider say '<month> 2020' so that the actual time is known.


@review-notebook-app
Copy link

review-notebook-app bot commented Oct 19, 2021

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-10-19T16:14:49Z
----------------------------------------------------------------

Can you increase the label and ticks fontsize so they are basically equal to the notebook markdown fontsize?


acocac commented on 2021-10-20T21:05:43Z
----------------------------------------------------------------

- font size increased.

@review-notebook-app
Copy link

View / edit / reply to this conversation on ReviewNB

tom-andersson commented on 2021-10-19T16:14:50Z
----------------------------------------------------------------

  • Slider doesn't seem to work in the HTML file you sent
  • Same comment as above regarding fontsize

Copy link
Member Author

acocac commented Oct 20, 2021

  • Thanks for the point about the section name in the title. The new version includes sort of tags (badges) indicating the primary (environment) and secondary (modelling, sensor) sections. I'll work in other levels of tags available in sphinx-panels in future version.
  • Changed.
  • Changed.

View entire conversation on ReviewNB

Copy link
Member Author

acocac commented Oct 20, 2021

- font size increased.


View entire conversation on ReviewNB

@acocac
Copy link
Member Author

acocac commented Oct 24, 2021

@tom-andersson, thanks for the great feedback. It has been very interesting to me to reproduce the IceNet paper. I've worked on most of the latest suggestion including:

  • Use sphinx-panels tags to indicate the environmental setting and notebook category. The tags feature is a nice one to be implemented in the existing and future notebooks of the EnvAI book.
  • List of contributors (notebook and IceNet modelling).
  • matplotlib contour to generate the Fig 1 instead of the hvplot contour.
  • Fig 1 and Fig 3 work now using panel. The developers suggested to embedding the dashboard. This is a great feature and take-away for future demonstrators.

I'll close the issue once I confirm the IceNet notebook of the master branch works in the Pangeo Binder.

Thanks!! ❄️

@acocac
Copy link
Member Author

acocac commented Oct 24, 2021

The reviewer has recommended publication. Some future work is needed to improve text.

@acocac acocac merged commit 1090e11 into master Oct 24, 2021
@acocac
Copy link
Member Author

acocac commented Nov 10, 2021

Nick @nbarlowATI suggested some changes for the IceNet demo (see PR#6 in the Turing Pangeo Examples repository). I incorporated them and added Nick as reviewer.

@acocac acocac modified the milestone: 0.01 Dec 2, 2021
@acocac acocac deleted the acoca-icenet branch January 31, 2022 12:13
@acocac acocac added notebook Add/update notebook ready labels Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
medium priority modelling Modelling Notebooks notebook Add/update notebook ready
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants