Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port Scotland #58

Merged
merged 72 commits into from
Jul 10, 2024
Merged
Show file tree
Hide file tree
Changes from 65 commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
d2c0901
Initial application of asset to scotland methods
sgreenbury Feb 28, 2024
9969cf8
Merge branch 'main' into 36-port-scotland
sgreenbury Mar 5, 2024
714ceab
Add zipfile-deflate64 dep
sgreenbury Mar 6, 2024
cb8e86b
Add key_prefix
sgreenbury Mar 6, 2024
2e74291
Begin refactor with dagster
sgreenbury Mar 6, 2024
647534f
Merge branch 'main' into 36-port-scotland
sgreenbury Mar 7, 2024
9dadda2
Format, make partition keys unique, add geographies asset
sgreenbury Mar 7, 2024
b6e8e1d
Add config to asset_job
sgreenbury Mar 8, 2024
65d2704
Initial dagster rewrite for Scotland
sgreenbury Mar 9, 2024
6b20ab1
Add function to add metadata and metadata index asset
sgreenbury Mar 15, 2024
9cef0c3
Use markdown_from_plot util
sgreenbury Mar 18, 2024
799f8c7
Merge branch 'main' into 36-port-scotland
sgreenbury Mar 25, 2024
eafa238
Add required tables
sgreenbury Apr 17, 2024
5f65c91
Add catalog_metadata, revise catalog towards metric metadata
sgreenbury Apr 18, 2024
687c33f
Fix extracted zip file names
sgreenbury Apr 18, 2024
739c9d1
Rename as df column to partition_key
sgreenbury Apr 22, 2024
b9e76b0
Add initial census_derived for Scotland
sgreenbury Apr 22, 2024
b365a14
Add ISO3116-2 field, move download_file to module import
sgreenbury Apr 23, 2024
a44e744
Add ISO3116-2 field to Belgium and UK
sgreenbury Apr 23, 2024
f65cbae
Rename country metadata asset
sgreenbury Apr 23, 2024
dd5dcfd
Refactor and fix derived module, add geometry module
sgreenbury Apr 23, 2024
6844c6f
Rename modules to match Belgium
sgreenbury Apr 23, 2024
c4bd1ef
Fix imports, refactor Scotland catalog asset names
sgreenbury Apr 23, 2024
596729c
Add data publisher for Scotland
sgreenbury Apr 23, 2024
6a2d1b6
Fix column names
sgreenbury Apr 25, 2024
af3d00d
Add source data releases metadata, fix parquet_column_name field
sgreenbury Apr 25, 2024
0093dff
Merge branch 'main' into 36-port-scotland
sgreenbury Apr 25, 2024
06f0e10
Fix for CI
sgreenbury Apr 25, 2024
55402e6
Merge branch 'main' into 36-port-scotland
yongrenjie May 2, 2024
d7991e2
Update Scotland metadata to match new changes in #82
yongrenjie May 2, 2024
59cfde9
Merge branch 'main' into 36-port-scotland
sgreenbury May 23, 2024
a5ed66f
Merge branch '36-port-ni' into 36-port-scotland
sgreenbury May 29, 2024
167055a
Merge branch 'main' into 36-port-scotland
sgreenbury Jun 11, 2024
bacdcda
Merge remote-tracking branch 'origin/main' into 36-port-scotland
sgreenbury Jun 20, 2024
629548f
Add todo
sgreenbury Jun 20, 2024
8f3c7da
Comment out old versions
sgreenbury Jun 20, 2024
30af440
Add update to use country class
sgreenbury Jun 20, 2024
479324c
Fix geometry
sgreenbury Jun 20, 2024
582f578
Revise geographies with overload providing lookups
sgreenbury Jun 22, 2024
3a3ba28
Add dep
sgreenbury Jun 22, 2024
2281b6f
Fix arg, add todo
sgreenbury Jun 22, 2024
e518f20
Update derived metrics
sgreenbury Jun 22, 2024
24a4dd5
Fix non-integer cases
sgreenbury Jun 22, 2024
86ba91b
Remove obsolete modules
sgreenbury Jun 22, 2024
2149324
Remove obsolete code
sgreenbury Jun 22, 2024
1d282f0
Merge branch '120-filepaths' into 36-port-scotland
sgreenbury Jun 26, 2024
dfbef87
Rename module, add country metadata to class
sgreenbury Jun 26, 2024
450d7cf
Update metrics file name
sgreenbury Jun 26, 2024
ceccf58
Fix loop over geometry
sgreenbury Jun 27, 2024
2812184
Replace 'en' with 'eng'
sgreenbury Jun 27, 2024
452ad09
Add source_data_releases, fix geo output
sgreenbury Jun 27, 2024
1d28c55
Fix derived metric output
sgreenbury Jun 27, 2024
786bcd3
Fix module name
sgreenbury Jun 27, 2024
fec9af8
Fix test
sgreenbury Jun 27, 2024
31ac586
Add first modifications to ensure that runs for all tables
sgreenbury Jun 27, 2024
5522d17
Filter from catalog partition that is missing
sgreenbury Jun 27, 2024
33832b3
Add option to allow ok return from derived_metrics is partition fails
sgreenbury Jun 27, 2024
9b24856
Create try/except to optionally allow derived metrics with failures
sgreenbury Jun 27, 2024
7614f5f
Merge branch 'main' into 36-port-scotland
sgreenbury Jul 2, 2024
6ebdf72
Replace GEO_ID with COL enum
sgreenbury Jul 2, 2024
d62bfc8
Use tempfile.mkdtemp() for cache_dir
sgreenbury Jul 2, 2024
070c817
Add utils, rename static variables upper case
sgreenbury Jul 2, 2024
b61d5e7
Merge branch 'main' into 36-port-scotland
sgreenbury Jul 2, 2024
e15ddff
Revert tempfile
sgreenbury Jul 2, 2024
d561c40
Ensure CACHE_DIR made
sgreenbury Jul 2, 2024
27514ec
Update deps
sgreenbury Jul 10, 2024
de1422e
Remove URL constants
sgreenbury Jul 10, 2024
e0c86d3
Rename variable
sgreenbury Jul 10, 2024
b7baea2
Change minimum version requirement
sgreenbury Jul 10, 2024
068920a
Fix deprecated warning
sgreenbury Jul 10, 2024
8dacb88
Remove obsolete class field
sgreenbury Jul 10, 2024
ad3e787
Allow cache to previously exist
sgreenbury Jul 10, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -47,9 +47,12 @@ dependencies = [
"rdflib >=7.0.0", # Required to parse BEL TTL Metadata catalogue.
"icecream >=2.1.3", # General debugging tool
"python-slugify >=8.0.4", # Required for generating asset names from GBR Ordnance Survey OpenData Product names
"openpyxl",
sgreenbury marked this conversation as resolved.
Show resolved Hide resolved
"zipfile-deflate64",
"jcs >=0.2.1", # For generating IDs from class attributes
"beautifulsoup4 >=4.12.3", # For extracting catalogs from web pages
"openpyxl >=3.1.3", # For reading Excel files
"xlrd >=2.0.1", # For reading Excel files
"iso639-lang >=2.2.3", # For checking ISO639-3 language codes
"aiohttp >=3.9.5", # Async HTTP
]
Expand Down
6 changes: 4 additions & 2 deletions python/popgetter/assets/__init__.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
from __future__ import annotations

from . import bel, gb_nir, uk, us
from . import bel, gb_nir, gb_sct, uk, us

countries = [(mod, mod.__name__.split(".")[-1]) for mod in [bel, gb_nir, uk, us]]
countries = [
(mod, mod.__name__.split(".")[-1]) for mod in [bel, gb_nir, uk, us, gb_sct]
]

__all__ = ["countries"]
3 changes: 2 additions & 1 deletion python/popgetter/assets/gb_nir/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -594,7 +594,7 @@ def pivot_df(df: pd.DataFrame, end: str) -> tuple[list[str], pd.DataFrame]:
# Ensure columns are string
else:
pivot.columns = [str(col).strip() for col in pivot.columns.to_numpy()]
out_cols = [col.replace(var_type, "").strip() for col in pivot_cols]
out_cols = [col.replace(end, "").strip() for col in pivot_cols]
return out_cols, pivot

# Pivot for codes and labels
Expand All @@ -608,6 +608,7 @@ def pivot_df(df: pd.DataFrame, end: str) -> tuple[list[str], pd.DataFrame]:
new_mmd = source_mmd.copy()
new_mmd.parent_metric_id = source_mmd.source_metric_id
new_mmd.metric_parquet_path = parquet_file_name
# TODO: check this
key_val = dict(zip(out_cols, metric_col.split(SEP), strict=True))

def gen_hxltag(kv: dict[str, str]) -> str:
Expand Down
Loading
Loading