Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: Rerun the full workflow in 2023 #14

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jnnr
Copy link

@jnnr jnnr commented Aug 29, 2023

This PR comprises all changes I had to do to rerun the workflow in summer 2023. When this is approved I can also update a new pre-built version on zenodo, preferably including the generated base data, not just the model files.

To download the hydro-basins, I had to adapt a rule in the euro-calliope subworkflow (also described in calliope-project/euro-calliope#267). I have no idea yet how I can commit this change, as euro-calliope is a reference to a certain commit on euro-calliope. Could either change the reference to the current euro-calliope with the proposed change, or, if that introduces new bugs, to a new commit/tag on euro-calliope branching from the original reference.

before

    "curl -sLo {output} '{params.url}'"

after

    "curl --http1.1 -sLo {output} '{params.url}'"

Full log of bugfixing

Trying to reproduce the pre-build 2022-02-08 or something similar to get all preprocessed resources.

latest sector-coupled-euro-calliope on main, I run:

snakemake --configfile config/euro-calliope-2050.yaml --use-conda --profile default --cores 1 "build/model/ehighways/model-2016.yaml"

rules/shapes.smk:109:        nuts_to_regions = lambda wildcards: config["data-sources"]["statistical-units-to-custom-regions"][wildcards.resolution]

the key seems to have been there in the latest tagged version, but not there any more on main.


Compare main and last tag 2022-06-01: https://github.com/calliope-project/sector-coupled-euro-calliope/compare/2022-06-01..main

I realized that statistical-units-to-custom-regions is present in the most recent config/euro-calliope-2050.yaml, but with underscores.

After changing underscores to dashes, I ran into this bug:

.conda/envs/sector-coupled-euro-calliope/lib/python3.8/site-packages/snakemake/rules.py", line 1138, in __eq__
    return self.name == other.name and self.output == other.output
AttributeError: 'str' object has no attribute 'name'

, which is a known snakemake bug snakemake/snakemake#1899

Solved by adding this to environment.yaml

- tabulate=0.8.10 # fixes incompatibility of tabulate 0.9 and snakemake 6.1.1. Should be fixed after snakemake 7.15.2

Alternatively, could upgrade snakemake. Not doing this now because it may lead to other errors.

Later today, I run into the known error in euro-calliope: Failing to download hydro-basins from dropbox. Fixed it by adapting

shell: "curl -sLo {output} '{params.url}'" 

to

shell: "curl --http1.1 -sLo {output} '{params.url}'"

in rule download_basins_database. (euro-calliope/rules/hydro.smk)


fiona ImportError: libnsl.so.1: cannot open shared object file: No such file or directory

Installing libnsnl in environment responsible for geodata

Adapt envs/geodata.yaml by adding libnsl and lifting fiona constraint
- fiona #=1.8.13
- libnsl

fiona=1.8.13 does not work. Maybe it worked before libnsl got updated, so could try to find a version of libnsl
fiona=1.9.1 seems to work, but is incompatible with the other versions


Have solved it by copying euro-calliope/envs/geo.yaml and including xlrd into envs/geodata.yaml:

name: geodata
channels:
    - conda-forge
dependencies:
    - python=3.8
    - numpy=1.20.2
    - scipy=1.6.2
    - pandas=1.2.3
    - gdal=3.2.1
    - libgdal=3.2.1
    - fiona=1.8.18
    - rasterio=1.2.1
    - rasterstats=0.14.0
    - geos=3.9.1
    - geopandas=0.9.0
    - netcdf4=1.5.6
    - xarray=0.17.0
    - jinja2=2.11.3
    - networkx=2.2
    - pycountry=19.8.18
    - pip=21.0.1
    - xlrd=1.2.0
    - pip:
        - -e ../../lib[geo]

Next error in sector-coupled-euro-calliope/src/construct/annual_subnational_demand.py:

    passenger_df = align_and_scale(
        df.rename(util.get_alpha3, level='country_code')
        .drop(['Heavy duty vehicles', 'Light duty vehicles'], level=0),
        population_intensity, units
    )

KeyError: "labels ['Heavy duty vehicles'] not found in level"

For the second of the three dataframes looped over, Heavy duty v is not in index. Fixed by setting errors="ignore" in df.drop

Next error in /home/jlauner/repos/sector-coupled-euro-calliope/src/construct/annual_subnational_demand.py l 332

industry_employees = (
    industry_employees

this-> .loc[activity_codes_df['Eurostat sector'].dropna().index]
.unstack()
.groupby(activity_codes_df['Eurostat sector'].to_dict()).sum(min_count=1)
.stack([0, 1])
.rename_axis(index=['subsector', 'year', 'region'])
)

    raise KeyError(f"{keyarr[mask]} not in index")                                                                                                                                                            

KeyError: "Index(['2', '3', '4', '5', '6', '7', '8', '9', '21', '22', '23', '24', '25',\n '26', '27', '28', '29', '30', '31', '32', '33', '34', '35', '36', '37',\n '38', '39', '40', '41', '42',
'43', '44', '99', '2(a)', '2(b)', '2(c)',\n '2(c)(i)', '2(c)(ii)', '2(c)(iii)', '2(d)', '2(e)', '2(e)(i)',\n '2(e)(ii)', '3(a)', '3(b)', '3(c)', '3(c)(i)', '3(c)(ii)', '3(c)(iii)',\n '3(e)
', '3(f)', '3(g)', '4(a)', '4(a)(i)', '4(a)(ii)', '4(a)(iii)',\n '4(a)(iv)', '4(a)(v)', '4(a)(vi)', '4(a)(vii)', '4(a)(viii)',\n '4(a)(ix)', '4(a)(x)', '4(a)(xi)', '4(b)', '4(b)(i)', '4(b)(ii)',
\n '4(b)(iii)', '4(b)(iv)', '4(b)(v)', '4(c)', '4(d)', '4(e)', '4(f)',\n '6(a)', '6(b)', '6(c)', '8(b)', '8(b)(i)', '8(b)(ii)', '8(c)', '9(a)',\n '9(b)', '9(c)', '9(e)'],\n dtype='obj
ect', name='Activity code') not in index"

The two datasets being wrangled with are

"activity_codes_df": data/industry/industry_activity_codes.csv
"industry_employees": data/automatic/eurostat-employees.tsv.gz

The problem is that a lot of the codes are non-existing in the index of the other dataframe. Could be that Bryn used pandas <1.0.0 (1.0.0 wwas released 2020-01-29) so this did not raise a keyerror.

The script actually runs in the geodata environment, which I adapted. But even before, it used pandas=1.0.5, so that does not explain why it should have ran before.

Tried it with pandas=1.0.5. Seems to work?! Strange, because the data is not different. And this fails when introduced into the script

df=pd.DataFrame({1: [2,4,5]}, index=[1,2,3])
print(df.loc[pd.Index([0,2,3])]) 

This works however for multiindex on pandas 1.0.5

"import pandas as pd; df=pd.DataFrame({1: [2,4,5]}, index=pd.MultiIndex.from_tuples([(1,1),(2,2),(3,2)])); print(df.loc[[0,2]])"

So I will continue with pandas 1.0.5 for the moment. Better would be to adapt the loc to something more safe, maybe changing to reindex

@irm-codebase
Copy link

irm-codebase commented Oct 26, 2023

Adding some comments on this fix based on stuff I had to do to get this running on DelftBlue:

  • b14a8ba should be added to euro-calliope-2030.yaml and euro-calliope-2040.yaml too.
  • b1dd6bb should also be applied to the euro-calliope environment.yaml file.
  • b18e80a the same xlrd version should also be added to the geo.yaml file in euro-calliope.

Unfortunately I cannot pinpoint which of last two above eventually fixed my runs, but they should be considered if this is ever implemented.

@sjpfenninger
Copy link
Member

I had to add the additional requirement python=3.8 to shell.yaml, otherwise I get conflicts because for reasons unknown to me mamba/conda tries to use python=3.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants