Fix: Rerun the full workflow in 2023 #14
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR comprises all changes I had to do to rerun the workflow in summer 2023. When this is approved I can also update a new pre-built version on zenodo, preferably including the generated base data, not just the model files.
To download the hydro-basins, I had to adapt a rule in the euro-calliope subworkflow (also described in calliope-project/euro-calliope#267). I have no idea yet how I can commit this change, as euro-calliope is a reference to a certain commit on euro-calliope. Could either change the reference to the current euro-calliope with the proposed change, or, if that introduces new bugs, to a new commit/tag on euro-calliope branching from the original reference.
before
after
Full log of bugfixing
Trying to reproduce the pre-build 2022-02-08 or something similar to get all preprocessed resources.
latest sector-coupled-euro-calliope on main, I run:
snakemake --configfile config/euro-calliope-2050.yaml --use-conda --profile default --cores 1 "build/model/ehighways/model-2016.yaml"
the key seems to have been there in the latest tagged version, but not there any more on main.
Compare main and last tag 2022-06-01: https://github.com/calliope-project/sector-coupled-euro-calliope/compare/2022-06-01..main
I realized that statistical-units-to-custom-regions is present in the most recent config/euro-calliope-2050.yaml, but with underscores.
After changing underscores to dashes, I ran into this bug:
, which is a known snakemake bug snakemake/snakemake#1899
Solved by adding this to environment.yaml
Alternatively, could upgrade snakemake. Not doing this now because it may lead to other errors.
Later today, I run into the known error in euro-calliope: Failing to download hydro-basins from dropbox. Fixed it by adapting
to
in rule download_basins_database. (euro-calliope/rules/hydro.smk)
fiona ImportError: libnsl.so.1: cannot open shared object file: No such file or directory
Installing libnsnl in environment responsible for geodata
Adapt envs/geodata.yaml by adding libnsl and lifting fiona constraint
- fiona #=1.8.13
- libnsl
fiona=1.8.13 does not work. Maybe it worked before libnsl got updated, so could try to find a version of libnsl
fiona=1.9.1 seems to work, but is incompatible with the other versions
Have solved it by copying euro-calliope/envs/geo.yaml and including xlrd into envs/geodata.yaml:
Next error in sector-coupled-euro-calliope/src/construct/annual_subnational_demand.py:
KeyError: "labels ['Heavy duty vehicles'] not found in level"
For the second of the three dataframes looped over, Heavy duty v is not in index. Fixed by setting errors="ignore" in df.drop
Next error in /home/jlauner/repos/sector-coupled-euro-calliope/src/construct/annual_subnational_demand.py l 332
this-> .loc[activity_codes_df['Eurostat sector'].dropna().index]
.unstack()
.groupby(activity_codes_df['Eurostat sector'].to_dict()).sum(min_count=1)
.stack([0, 1])
.rename_axis(index=['subsector', 'year', 'region'])
)
KeyError: "Index(['2', '3', '4', '5', '6', '7', '8', '9', '21', '22', '23', '24', '25',\n '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37',\n '38', '39', '40', '41', '42',
'43', '44', '99', '2(a)', '2(b)', '2(c)',\n '2(c)(i)', '2(c)(ii)', '2(c)(iii)', '2(d)', '2(e)', '2(e)(i)',\n '2(e)(ii)', '3(a)', '3(b)', '3(c)', '3(c)(i)', '3(c)(ii)', '3(c)(iii)',\n '3(e)
', '3(f)', '3(g)', '4(a)', '4(a)(i)', '4(a)(ii)', '4(a)(iii)',\n '4(a)(iv)', '4(a)(v)', '4(a)(vi)', '4(a)(vii)', '4(a)(viii)',\n '4(a)(ix)', '4(a)(x)', '4(a)(xi)', '4(b)', '4(b)(i)', '4(b)(ii)',
\n '4(b)(iii)', '4(b)(iv)', '4(b)(v)', '4(c)', '4(d)', '4(e)', '4(f)',\n '6(a)', '6(b)', '6(c)', '8(b)', '8(b)(i)', '8(b)(ii)', '8(c)', '9(a)',\n '9(b)', '9(c)', '9(e)'],\n dtype='obj
ect', name='Activity code') not in index"
The two datasets being wrangled with are
The problem is that a lot of the codes are non-existing in the index of the other dataframe. Could be that Bryn used pandas <1.0.0 (1.0.0 wwas released 2020-01-29) so this did not raise a keyerror.
The script actually runs in the geodata environment, which I adapted. But even before, it used pandas=1.0.5, so that does not explain why it should have ran before.
Tried it with pandas=1.0.5. Seems to work?! Strange, because the data is not different. And this fails when introduced into the script
This works however for multiindex on pandas 1.0.5
"import pandas as pd; df=pd.DataFrame({1: [2,4,5]}, index=pd.MultiIndex.from_tuples([(1,1),(2,2),(3,2)])); print(df.loc[[0,2]])"
So I will continue with pandas 1.0.5 for the moment. Better would be to adapt the loc to something more safe, maybe changing to reindex