Fix: Rerun the full workflow in 2023 #14

jnnr · 2023-08-29T12:54:06Z

This PR comprises all changes I had to do to rerun the workflow in summer 2023. When this is approved I can also update a new pre-built version on zenodo, preferably including the generated base data, not just the model files.

To download the hydro-basins, I had to adapt a rule in the euro-calliope subworkflow (also described in calliope-project/euro-calliope#267). I have no idea yet how I can commit this change, as euro-calliope is a reference to a certain commit on euro-calliope. Could either change the reference to the current euro-calliope with the proposed change, or, if that introduces new bugs, to a new commit/tag on euro-calliope branching from the original reference.

before

    "curl -sLo {output} '{params.url}'"

after

    "curl --http1.1 -sLo {output} '{params.url}'"

Full log of bugfixing

Trying to reproduce the pre-build 2022-02-08 or something similar to get all preprocessed resources.

latest sector-coupled-euro-calliope on main, I run:

snakemake --configfile config/euro-calliope-2050.yaml --use-conda --profile default --cores 1 "build/model/ehighways/model-2016.yaml"

rules/shapes.smk:109:        nuts_to_regions = lambda wildcards: config["data-sources"]["statistical-units-to-custom-regions"][wildcards.resolution]

the key seems to have been there in the latest tagged version, but not there any more on main.

Compare main and last tag 2022-06-01: https://github.com/calliope-project/sector-coupled-euro-calliope/compare/2022-06-01..main

I realized that statistical-units-to-custom-regions is present in the most recent config/euro-calliope-2050.yaml, but with underscores.

After changing underscores to dashes, I ran into this bug:

.conda/envs/sector-coupled-euro-calliope/lib/python3.8/site-packages/snakemake/rules.py", line 1138, in __eq__
    return self.name == other.name and self.output == other.output
AttributeError: 'str' object has no attribute 'name'

, which is a known snakemake bug snakemake/snakemake#1899

Solved by adding this to environment.yaml

- tabulate=0.8.10 # fixes incompatibility of tabulate 0.9 and snakemake 6.1.1. Should be fixed after snakemake 7.15.2

Alternatively, could upgrade snakemake. Not doing this now because it may lead to other errors.

Later today, I run into the known error in euro-calliope: Failing to download hydro-basins from dropbox. Fixed it by adapting

shell: "curl -sLo {output} '{params.url}'"

to

shell: "curl --http1.1 -sLo {output} '{params.url}'"

in rule download_basins_database. (euro-calliope/rules/hydro.smk)

fiona ImportError: libnsl.so.1: cannot open shared object file: No such file or directory

Installing libnsnl in environment responsible for geodata

Adapt envs/geodata.yaml by adding libnsl and lifting fiona constraint
- fiona #=1.8.13
- libnsl

fiona=1.8.13 does not work. Maybe it worked before libnsl got updated, so could try to find a version of libnsl
fiona=1.9.1 seems to work, but is incompatible with the other versions

Have solved it by copying euro-calliope/envs/geo.yaml and including xlrd into envs/geodata.yaml:

name: geodata
channels:
    - conda-forge
dependencies:
    - python=3.8
    - numpy=1.20.2
    - scipy=1.6.2
    - pandas=1.2.3
    - gdal=3.2.1
    - libgdal=3.2.1
    - fiona=1.8.18
    - rasterio=1.2.1
    - rasterstats=0.14.0
    - geos=3.9.1
    - geopandas=0.9.0
    - netcdf4=1.5.6
    - xarray=0.17.0
    - jinja2=2.11.3
    - networkx=2.2
    - pycountry=19.8.18
    - pip=21.0.1
    - xlrd=1.2.0
    - pip:
        - -e ../../lib[geo]

Next error in sector-coupled-euro-calliope/src/construct/annual_subnational_demand.py:

    passenger_df = align_and_scale(
        df.rename(util.get_alpha3, level='country_code')
        .drop(['Heavy duty vehicles', 'Light duty vehicles'], level=0),
        population_intensity, units
    )

KeyError: "labels ['Heavy duty vehicles'] not found in level"

For the second of the three dataframes looped over, Heavy duty v is not in index. Fixed by setting errors="ignore" in df.drop

Next error in /home/jlauner/repos/sector-coupled-euro-calliope/src/construct/annual_subnational_demand.py l 332

industry_employees = (
    industry_employees

this-> .loc[activity_codes_df['Eurostat sector'].dropna().index]
.unstack()
.groupby(activity_codes_df['Eurostat sector'].to_dict()).sum(min_count=1)
.stack([0, 1])
.rename_axis(index=['subsector', 'year', 'region'])
)

    raise KeyError(f"{keyarr[mask]} not in index")

KeyError: "Index(['2', '3', '4', '5', '6', '7', '8', '9', '21', '22', '23', '24', '25',\n '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37',\n '38', '39', '40', '41', '42',
'43', '44', '99', '2(a)', '2(b)', '2(c)',\n '2(c)(i)', '2(c)(ii)', '2(c)(iii)', '2(d)', '2(e)', '2(e)(i)',\n '2(e)(ii)', '3(a)', '3(b)', '3(c)', '3(c)(i)', '3(c)(ii)', '3(c)(iii)',\n '3(e)
', '3(f)', '3(g)', '4(a)', '4(a)(i)', '4(a)(ii)', '4(a)(iii)',\n '4(a)(iv)', '4(a)(v)', '4(a)(vi)', '4(a)(vii)', '4(a)(viii)',\n '4(a)(ix)', '4(a)(x)', '4(a)(xi)', '4(b)', '4(b)(i)', '4(b)(ii)',
\n '4(b)(iii)', '4(b)(iv)', '4(b)(v)', '4(c)', '4(d)', '4(e)', '4(f)',\n '6(a)', '6(b)', '6(c)', '8(b)', '8(b)(i)', '8(b)(ii)', '8(c)', '9(a)',\n '9(b)', '9(c)', '9(e)'],\n dtype='obj
ect', name='Activity code') not in index"

The two datasets being wrangled with are

"activity_codes_df": data/industry/industry_activity_codes.csv
"industry_employees": data/automatic/eurostat-employees.tsv.gz

The problem is that a lot of the codes are non-existing in the index of the other dataframe. Could be that Bryn used pandas <1.0.0 (1.0.0 wwas released 2020-01-29) so this did not raise a keyerror.

The script actually runs in the geodata environment, which I adapted. But even before, it used pandas=1.0.5, so that does not explain why it should have ran before.

Tried it with pandas=1.0.5. Seems to work?! Strange, because the data is not different. And this fails when introduced into the script

df=pd.DataFrame({1: [2,4,5]}, index=[1,2,3])
print(df.loc[pd.Index([0,2,3])])

This works however for multiindex on pandas 1.0.5

"import pandas as pd; df=pd.DataFrame({1: [2,4,5]}, index=pd.MultiIndex.from_tuples([(1,1),(2,2),(3,2)])); print(df.loc[[0,2]])"

So I will continue with pandas 1.0.5 for the moment. Better would be to adapt the loc to something more safe, maybe changing to reindex

…alliope/envs/geo.yaml

irm-codebase · 2023-10-26T10:17:25Z

Adding some comments on this fix based on stuff I had to do to get this running on DelftBlue:

b14a8ba should be added to euro-calliope-2030.yaml and euro-calliope-2040.yaml too.
b1dd6bb should also be applied to the euro-calliope environment.yaml file.
b18e80a the same xlrd version should also be added to the geo.yaml file in euro-calliope.

Unfortunately I cannot pinpoint which of last two above eventually fixed my runs, but they should be considered if this is ever implemented.

sjpfenninger · 2024-03-26T17:33:18Z

I had to add the additional requirement python=3.8 to shell.yaml, otherwise I get conflicts because for reasons unknown to me mamba/conda tries to use python=3.12

jnnr added 4 commits August 29, 2023 14:38

Change underscores to dashes to fix KeyError in rules/shapes.smk

b14a8ba

Fix known snakemake bug snakemake/snakemake#1899

b1dd6bb

Fix libsnl.so.1 ImportError by adapting envs/geodata.yaml from euro-c…

b18e80a

…alliope/envs/geo.yaml

Fix KeyError for 'Heavy Duty Vehicles'

290ef59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: Rerun the full workflow in 2023 #14

Fix: Rerun the full workflow in 2023 #14

jnnr commented Aug 29, 2023

irm-codebase commented Oct 26, 2023 •

edited

Loading

sjpfenninger commented Mar 26, 2024

Fix: Rerun the full workflow in 2023 #14

Are you sure you want to change the base?

Fix: Rerun the full workflow in 2023 #14

Conversation

jnnr commented Aug 29, 2023

Full log of bugfixing

irm-codebase commented Oct 26, 2023 • edited Loading

sjpfenninger commented Mar 26, 2024

irm-codebase commented Oct 26, 2023 •

edited

Loading