Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-executability - notes from my journey #23

Open
consideRatio opened this issue Aug 22, 2019 · 4 comments
Open

Re-executability - notes from my journey #23

consideRatio opened this issue Aug 22, 2019 · 4 comments

Comments

@consideRatio
Copy link

I'm very excited that this is available on GitHub! Thank you everybody involved in this!

I'd love to help out making it runnable by the click of a badge on mybinder.org. Like this one, allowing you to re-execute some data analysis part of the LIGO teams work to analyze the gravitational wave sensor data.

https://mybinder.org/v2/gh/minrk/ligo-binder/master?urlpath=lab/tree/index.ipynb

Below are my notes on trying to run the notebooks without any previous experience of this work.

Notes

In short, mybinder.org will given a given github repo build a docker image and install dependencies found in typical configuration files like conda's environment.yml etc. As far as I can tell though, this repo does not yet contain any information on its python package dependencies other than perhaps information about pyam that should be of version 0.2.0.

So, I set out to try to figure out what made the code runnable, create an environment.yml file, and add a mybinder.org badge to the README.md in a PR.

But, I run into some issues though.

pyam=0.2.0 pandas=0.24 and ipython
I started out cloning the repo, created a fresh conda environment with Python 3.7, and installed pandas and pyam. That led me to the following error when importing pyam though.

image

Apparently that issue was resolved with pandas=0.24 instead of pandas=0.25. And another issue went away by installing ipython.

Downloading of data

  1. I needed to register an account here: https://data.ene.iiasa.ac.at/iamc-1.5c-explorer/
  2. I needed to https://data.ene.iiasa.ac.at/iamc-1.5c-explorer/#/downloads and manually download data
  3. I needed to make it available at my data directory

Minor renaming of metadata file imports
I needed to adjust the import of the metadata.

# old failing
# sr1p5.load_metadata('sr15_metadata_indicators.xlsx')
# new functional
sr1p5.load_metadata('../data/sr15_metadata_indicators_r1.1.xlsx')

Notebook execution success
After the above, no errors occurred running spm_sr15_figure_3a_global_emissions_pathways.ipynb and all files down to the one with sr15_2.4.1 in its name.

Summary of impressions

  • It wasn't so easy to get going
  • I had to rely on some faith, when I saw the lack of environment.yml, requirements.txt, Pipfile, etc that specified dependencies, I knew I probably would be in for some additional unknown amount of work.
  • I would have liked the ability to get an API token for the user to download the data to the /data folder, then the manual steps could be reduced significantly. Perhaps that is actually already possible without me knowing about it? Hmmm...
@danielhuppmann
Copy link
Member

Thanks @consideRatio for the feedback - it's great to see that the assessment notebooks are being used!

General response

The notebooks and the repository, as well as the pyam package, were developed throughout the IPCC SR15 process to support the assessment, and they changed a lot over the time in response to shifts of focus in the report. That is the reason for some non-intuitive (from an ex-post perspective) structure decisions. I apologize for the inconvenience.

This repository wasn't intended to be executed as is, hence the missing environment files. I've no experience with my https://binder.org, so I'm not sure if this is worth the effort - I have basically done a poor man's version of that by posting the executed notebooks here. But I would really appreciate any PR fro you to improve the user experience. Maybe just adding to the README would already be a big help for the next user...

Detailed responses

Folder structure

The file sr15_metadata_indicators.xlsx is created by the sr15_2.0_categories_indicators.ipynb notebook in the assessment folder, therefore it made sense (at the time) that this file is located there rather than in data. The downloadable version was intended only for those users not interested in running the notebooks. This could be explained better in the README.

If you run the categorisation-notebook first, all notebooks should run fine (a related issue about the plotting-settings was raised by @nworbmot in #15).

Why do we make you download the data?

See the section Why did we choose this license? section at the IAMC 1.5°C Scenario Explorer License page why we thought it was important to keep track of users (TL;DR: to send announcements when we fix data errors so that users can update their analysis accordingly).

Having said that, pyam supports reading data directly from the IIASA database server, see this tutorial (shoutout to @gidden and @zikolach for their heroic effort!) However, this is not ideal in terms of reproducibility, because you will get the latest version of the data on the IIASA server, not necessarily the data that was used for the SR15 assessment as printed (in principle, the ixmp backend also supports data versioning, but that leads a bit too far here and is not fully integrated with pyam). Downloading a specific version is a better guarantee of exact replication.

The main dependency: the pyam package

I was of the opinion that if you do a standard anaconda install plus pyam=0.2.0, all notebooks run smoothly. The issue with pandas=0.25.0 was discussed (and solved) here.

Still, pyam is still in heavy development, but converging to a more mature project. Any contribution there would be even more appreciated!

@danielhuppmann
Copy link
Member

Not closing this issue for now, hoping for a PR from you, @consideRatio... Please close this issue if you feel you won't have time for it.

@whatfuturedotlol
Copy link

So i have just gone through and re-created this project on my own laptop, and after quite a lot of debugging, got it all working.

i am on an m1 mac, and had to use rosetta for an intel environment so i could install python 3.7 as well as some of the older packages.

there were a couple errors in the notebook for generating the .yaml file (which you have corrected on the web version, but not pushed to master).

I also had remove a line from the .yaml file that was causing a constructor error, but with that removed. I was able to reproduce everything, and am now playing about with it myself.

i'm looking into mybinder/docker or at the very least producing a some requirements files for a working version.

@whatfuturedotlol
Copy link

just to follow up here. my working branch is here https://github.com/whatfuturedotlol/ipcc_sr15_scenario_analysis/tree/iam-update

it includes

  • pipfile.txt
  • environments.yml
  • correct input data in /data
  • correct generated data sr15_specs.yaml & sr15_metadata_indicators.xlsx in /assessments
  • fixed sr15_2.0_categories_indicators.ipynb if you want to generate your own data
  • ENV_CONFIG.md explaining this
  • updated README.md with fixed notes

I couldn't get a mybinder.org version working as it ran out of memory with error 137. I may look at this later, but for now this branch provides the relevant fixes to get it working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants