Skip to content

Commit

Permalink
Merge pull request #165 from urbanbigdatacentre/dev-morphological_inf…
Browse files Browse the repository at this point in the history
…ormality_model

Morphological model code and documentation
  • Loading branch information
SebastianHafner authored Jan 17, 2025
2 parents 46526c8 + 65e47d4 commit 19581d8
Show file tree
Hide file tree
Showing 14 changed files with 815 additions and 1 deletion.
45 changes: 44 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,47 @@
# ideamaps-models
# IDEAMAPS Data Ecosystem: Models of Deprivation


## 📚 Introduction

The [IDEAMAPS Data Ecosystem project](https://www.ideamapsnetwork.org/project/ideamaps-data-ecosystem) is co-designing and developing a participatory data-modelling ecosystem to produce deprived area maps routinely and accurately at scale across cities in lower and middle-income countries to support multiple local stakeholders in their decision-making.

In this repository, we store several model developed within the project that address different [domains of deprivation](https://doi.org/10.1016/j.compenvurbsys.2022.101770), including morphological informality and barriers to healthcare. Additionally, we provide tools to deploy our models to new cities.


## 🌍 Available Models

The following is a list of deprivation models that have been developed within the project:

| Domain of Deprivation | City | Folder | Version | Documentation |
|:-------------------------:|:---------------:|:-----------------------------------------------------------------------------------------------------------------------:|:-------:|---------------|
| Morphological Informality | Nairobi (Kenya) | [Link](https://github.com/urbanbigdatacentre/ideamaps-models/tree/main/Sub-domains/MorphologicalInformality/Nairobi_v3) | V3 | |
| Morphological Informality | Lagos (Nigeria) | [Link](https://github.com/urbanbigdatacentre/ideamaps-models/tree/main/Sub-domains/MorphologicalInformality/Lagos_v3) | V3 | |
| Morphological Informality | Kano (Nigeria) | [Link](https://github.com/urbanbigdatacentre/ideamaps-models/tree/main/Sub-domains/MorphologicalInformality/Kano_v3) | V3 | |
| Barriers to Healthcare | Kano (Nigeria) | [Link](https://github.com/urbanbigdatacentre/ideamaps-models/tree/main/Sub-domains/BarriersHealthCareAccess/Kano_v1.1) | V1.1 | |


## ⚙️ Model Deployment


We also provide code to deploy our models to new cities. To do so, please follow the instructions available in the respective model directories.



## 🗺️ Model Validation


Our model outputs can also be validated using the [IDEAMAPS Data Ecosystem platform](https://www.ideamapsdataecosystem.org/). The validation data will be used to iteratively improve our models.



## ✏️ Contributing
We appreciate all contributions. Please refer to Contributing Guidelines.


## 📝 References

If you find this work useful, please cite our IDEAMAPS Data Ecosystem umbrella paper:

```
```
77 changes: 77 additions & 0 deletions Sub-domains/MorphologicalInformality/Sourcecode/V3/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Deploying the Morphological Informality Model (V3)



This folder contains all required code to model morphological informality based on building footprint data.

We refer to our publication for a detailed description of the methodology: [preprint]().




## 🛠️ Setup


1. **Clone the repository**:
```
git clone https://github.com/urbanbigdatacentre/ideamaps-models.git
cd ideamaps-models/Sub-domains/MorphologicalInformality/Sourcecode/V3
```
2. **Create a virtual environment using Conda**:
```
conda env create ideamaps-models python=3.10
conda activate ideamaps-models
```
3. **Install dependencies from requirements.txt file** using pip
```
pip install -r requirements.txt
```
## 🏚️ Prepare Building Footprint Data
Our model requires building footprints as input data. There are several providers for open building footprint data. We recommend using data from the [Overture Map Foundation](https://overturemaps.org/).
## ⚙️ Run Model
Follow these steps to obtain clusters of similar urban form types.
1. **Create the basic urban form elements**
```
python geoelements.py -e *path to the file* -b *path to the builidng footprints file* -o *path the the output dir*
```
2. **Create the basic urban form elements**
```
python morphometrics.py -b *path to building footprints file* -t *path to tessellation file* -o *path the the output dir*
```
3. **Create the basic urban form elements**
The morphometrics dir corresponds to the output dir used in step 2.
```
python aggregation.py -m *path to the morphometrics dir* -b *path to the builidng footprints file* -g *path to the grid file* -o *path the the output dir*
```
4. **Create the basic urban form elements**
```
python clustering.py -m *path to the morphometrics file* -o *path the the output dir*
```
The resulting urban form clusters can be linked to morphological informality.
## 📝 Reference
If you find this work useful, please cite:
```

```
94 changes: 94 additions & 0 deletions Sub-domains/MorphologicalInformality/Sourcecode/V3/aggregation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
import geopandas as gpd
import pandas as pd
import numpy as np
from pathlib import Path

from parsers import aggregation_parser as argument_parser


if __name__ == '__main__':
args = argument_parser().parse_known_args()[0]
# TODO: add ltcWRB

umm = gpd.read_parquet(args.building_file)
umm = umm[['uID', 'geometry']]
assert np.all(umm.is_valid)

# Loading Urban Morphometrics (UMM)
metrics = ['sdbAre', 'ssbElo', 'stbOri', 'stcOri', 'ssbCCD', 'sdcAre', 'sscERI', 'sicCAR', 'mtbAli', 'mtbNDi',
'mtcWNe', 'mdcAre', 'ltcBuA', 'ltbIBD', 'ltcWRB']

for metric in metrics:
metric_values = pd.read_parquet(Path(args.morphometrics_dir) / f'{metric}.pq')
umm = pd.merge(umm, metric_values, on='uID', how='inner')

umm = gpd.GeoDataFrame(umm, geometry='geometry')
umm = umm.to_crs("EPSG:4326")
umm['centroid'] = umm.geometry.centroid
umm = gpd.GeoDataFrame(umm, geometry='centroid').drop(columns='geometry')

#Aggregation to the grid
grid = gpd.read_file(args.grid_file)
grid = grid[['geometry']]
grid['grid_id'] = range(1, len(grid) + 1) # create column containing an unique raw numbering for each grid
grid = grid.to_crs("EPSG:4326")
assert np.all(grid.is_valid)

# Perform Spatial Join
umm_grid = gpd.sjoin(grid, umm, how='inner', predicate='intersects')

# handle missing data
has_missing_values = umm_grid.isnull().values.any()

umm_grid = umm_grid.dropna()

# Assuming 'joined' is your GeoDataFrame
# 'geometry' is the column name of the grid geometry
# 'grid_id' is the identifier for each grid cell

# 'variables' is a list of the variable names you want to aggregate by mean and median
median = ['sdcAre', 'ssbElo', 'ssbCCD', 'mtbAli', 'mtbNDi', 'ltcBuA', 'sdbAre', 'sscERI', 'sicCAR', 'mtcWNe',
'mdcAre', 'ltbIBD', 'ltcWRB']
sd = ['stbOri', 'stcOri']
sum = ['sdcAre']

# Set the grid geometry as the active geometry
print(umm_grid.columns)
umm_grid = umm_grid.set_geometry('geometry')

# Group by 'grid_id' and calculate median and std
median_values = umm_grid.groupby('grid_id')[median].median().add_prefix('md_')
sd_values = umm_grid.groupby('grid_id')[sd].std().fillna(0).add_prefix('sd_')
sum_values = umm_grid.groupby('grid_id')[sum].sum().add_prefix('sum_')

building_counts = umm_grid.groupby('grid_id').size().rename('bcount')
single_building_grids = building_counts[building_counts == 1]

# the NaN values are because there is only 1 building per grid
sd_values.isnull().sum()

# Assuming df_shp and join_df are your DataFrames and 'uID' is the common column
merge_stats = pd.merge(median_values, sd_values, on='grid_id', how='inner')
merge_stats = pd.merge(merge_stats, sum_values, on='grid_id', how='inner')
merge_stats = pd.merge(merge_stats, building_counts, on='grid_id', how='inner')

merge_stats.isnull().sum()

if grid.index.name != 'grid_id':
grid = grid.set_index('grid_id')

if merge_stats.index.name != 'grid_id':
merge_stats = merge_stats.set_index('grid_id')

# Perform Spatial Join
df_stats = pd.merge(grid, merge_stats, on='grid_id', how='inner')
gdf_stats = gpd.GeoDataFrame(df_stats, geometry='geometry', crs='EPSG:4326')

# if any column is duplicated
duplicate_columns = gdf_stats.columns[gdf_stats.columns.duplicated()]
print(duplicate_columns)

gdf_stats = gdf_stats.loc[:, ~gdf_stats.columns.duplicated()]

# Export to a new gpkg
gdf_stats.to_parquet(Path(args.output_dir) / 'morphometrics_grid.pq')
57 changes: 57 additions & 0 deletions Sub-domains/MorphologicalInformality/Sourcecode/V3/clustering.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import geopandas as gpd
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
from pathlib import Path

from parsers import clustering_parser as argument_parser

SEED = 7

if __name__ == '__main__':
args = argument_parser().parse_known_args()[0]

gdf = gpd.read_parquet(args.morphometrics_file)
print(gdf.columns)
gdf.head()
#TODO: only cluster cells with buildings

#TODO: add md_ltcWRB
morph_isl = ['md_ssbCCD', 'md_mtbAli', 'md_ltcBuA', 'md_mtcWNe', 'md_ltcWRB', 'sd_stbOri', 'sd_stcOri']

#TODO add md_ltcWRB and md_ltbIBD
morph_sds = ['md_sdcAre', 'md_ssbElo', 'md_mtbNDi', 'md_ltbIBD', 'md_ltcBuA', 'md_sdbAre', 'md_sscERI', 'md_sicCAR',
'md_mtcWNe', 'md_mdcAre', 'md_ltcWRB', 'sum_sdcAre']

gdf_isl = gdf[morph_isl]
gdf_sds = gdf[morph_sds]

# Initialize the StandardScaler object
scaler = StandardScaler()

# Scale the data by standardizing features
# by removing the mean and scaling to unit variance
data_isl = scaler.fit_transform(gdf_isl)
data_sds = scaler.fit_transform(gdf_sds)

# elbo Irregular Layout
# Calculating sum of squared distances for k in range 1 to 20
ssd = []
for k in [6, 8, 10]:
km = KMeans(n_clusters=k, random_state=SEED)
km = km.fit(data_isl)
ssd.append(km.inertia_)
gdf[f'isl_c{k}'] = km.labels_



# elbo Small, Dense Structures
# Calculating sum of squared distances for k in range 1 to 20
ssd = []
for k in [6, 8, 10]:
km = KMeans(n_clusters=k, random_state=SEED)
km = km.fit(data_sds)
ssd.append(km.inertia_)
gdf[f'sds_c{k}'] = km.labels_

gdf.to_parquet(Path(args.output_dir) / 'clustering.pq')

Loading

0 comments on commit 19581d8

Please sign in to comment.