Merge pull request #165 from urbanbigdatacentre/dev-morphological_inf…

…ormality_model Morphological model code and documentation
urbanbigdatacentre · Jan 17, 2025 · 19581d8 · 19581d8
2 parents 46526c8 + 65e47d4
commit 19581d8
Show file tree

Hide file tree

Showing 14 changed files with 815 additions and 1 deletion.
diff --git a/README.md b/README.md
@@ -1,4 +1,47 @@
-# ideamaps-models
+# IDEAMAPS Data Ecosystem: Models of Deprivation
 
 
+## 📚 Introduction
 
+The [IDEAMAPS Data Ecosystem project](https://www.ideamapsnetwork.org/project/ideamaps-data-ecosystem) is co-designing and developing a participatory data-modelling ecosystem to produce deprived area maps routinely and accurately at scale across cities in lower and middle-income countries to support multiple local stakeholders in their decision-making.
+
+In this repository, we store several model developed within the project that address different [domains of deprivation](https://doi.org/10.1016/j.compenvurbsys.2022.101770), including morphological informality and barriers to healthcare. Additionally, we provide tools to deploy our models to new cities.
+
+
+## 🌍 Available Models
+
+The following is a list of deprivation models that have been developed within the project:
+
+|   Domain of Deprivation   |      City       |                                                         Folder                                                          | Version | Documentation |
+|:-------------------------:|:---------------:|:-----------------------------------------------------------------------------------------------------------------------:|:-------:|---------------|
+| Morphological Informality | Nairobi (Kenya) | [Link](https://github.com/urbanbigdatacentre/ideamaps-models/tree/main/Sub-domains/MorphologicalInformality/Nairobi_v3) |   V3    |               |
+| Morphological Informality | Lagos (Nigeria) |  [Link](https://github.com/urbanbigdatacentre/ideamaps-models/tree/main/Sub-domains/MorphologicalInformality/Lagos_v3)  |   V3    |               |
+| Morphological Informality | Kano (Nigeria)  |  [Link](https://github.com/urbanbigdatacentre/ideamaps-models/tree/main/Sub-domains/MorphologicalInformality/Kano_v3)   |   V3    |               |
+|  Barriers to Healthcare   | Kano (Nigeria)  | [Link](https://github.com/urbanbigdatacentre/ideamaps-models/tree/main/Sub-domains/BarriersHealthCareAccess/Kano_v1.1)  |  V1.1   |               |
+
+
+## ⚙️ Model Deployment
+
+
+We also provide code to deploy our models to new cities. To do so, please follow the instructions available in the respective model directories.
+
+
+
+## 🗺️ Model Validation
+
+
+Our model outputs can also be validated using the [IDEAMAPS Data Ecosystem platform](https://www.ideamapsdataecosystem.org/). The validation data will be used to iteratively improve our models.
+
+
+
+## ✏️ Contributing
+We appreciate all contributions. Please refer to Contributing Guidelines.
+
+
+## 📝 References
+
+If you find this work useful, please cite our IDEAMAPS Data Ecosystem umbrella paper:
+
+```
+
+```
diff --git a/...icalInformality/Sourcecode/AdaBoost.ipynb → ...ologicalInformality/Legacy/AdaBoost.ipynb b/...icalInformality/Sourcecode/AdaBoost.ipynb → ...ologicalInformality/Legacy/AdaBoost.ipynb
diff --git a/...Informality/Sourcecode/DecisionTree.ipynb → ...icalInformality/Legacy/DecisionTree.ipynb b/...Informality/Sourcecode/DecisionTree.ipynb → ...icalInformality/Legacy/DecisionTree.ipynb
diff --git a/...rmality/Sourcecode/GradientBoosting.ipynb → ...Informality/Legacy/GradientBoosting.ipynb b/...rmality/Sourcecode/GradientBoosting.ipynb → ...Informality/Legacy/GradientBoosting.ipynb
diff --git a/...ologicalInformality/Sourcecode/README.txt → ...orphologicalInformality/Legacy/README.txt b/...ologicalInformality/Sourcecode/README.txt → ...orphologicalInformality/Legacy/README.txt
diff --git a/...Informality/Sourcecode/RandomForest.ipynb → ...icalInformality/Legacy/RandomForest.ipynb b/...Informality/Sourcecode/RandomForest.ipynb → ...icalInformality/Legacy/RandomForest.ipynb
diff --git a/Sub-domains/MorphologicalInformality/Sourcecode/V3/README.md b/Sub-domains/MorphologicalInformality/Sourcecode/V3/README.md
@@ -0,0 +1,77 @@
+#  Deploying the Morphological Informality Model (V3)
+
+
+
+This folder contains all required code to model morphological informality based on building footprint data.
+
+We refer to our publication for a detailed description of the methodology: [preprint](). 
+
+
+
+
+## 🛠️ Setup
+
+
+1. **Clone the repository**:
+    ```
+    git clone https://github.com/urbanbigdatacentre/ideamaps-models.git
+    cd ideamaps-models/Sub-domains/MorphologicalInformality/Sourcecode/V3
+    ```
+
+
+2. **Create a virtual environment using Conda**:
+    ```
+    conda env create ideamaps-models python=3.10
+    conda activate ideamaps-models
+    ```
+3. **Install dependencies from requirements.txt file** using pip
+   ```
+   pip install -r requirements.txt
+   ```
+
+
+## 🏚️ Prepare Building Footprint Data
+
+Our model requires building footprints as input data. There are several providers for open building footprint data. We recommend using data from the [Overture Map Foundation](https://overturemaps.org/).
+
+
+## ⚙️ Run Model
+
+Follow these steps to obtain clusters of similar urban form types.
+
+1. **Create the basic urban form elements**
+
+   ```
+   python geoelements.py -e *path to the file* -b *path to the builidng footprints file* -o *path the the output dir*
+   ```
+
+2. **Create the basic urban form elements**
+
+   ```
+   python morphometrics.py -b *path to building footprints file* -t *path to tessellation file* -o *path the the output dir*
+   ```
+
+3. **Create the basic urban form elements**
+
+   The morphometrics dir corresponds to the output dir used in step 2.
+
+   ```
+   python aggregation.py -m *path to the morphometrics dir* -b *path to the builidng footprints file* -g *path to the grid file* -o *path the the output dir*
+   ```
+      
+4. **Create the basic urban form elements**
+
+   ```
+   python clustering.py -m *path to the morphometrics file* -o *path the the output dir*
+   ```
+
+The resulting urban form clusters can be linked to morphological informality.
+
+
+## 📝 Reference
+
+If you find this work useful, please cite:
+
+```
+
+```
diff --git a/Sub-domains/MorphologicalInformality/Sourcecode/V3/aggregation.py b/Sub-domains/MorphologicalInformality/Sourcecode/V3/aggregation.py
@@ -0,0 +1,94 @@
+import geopandas as gpd
+import pandas as pd
+import numpy as np
+from pathlib import Path
+
+from parsers import aggregation_parser as argument_parser
+
+
+if __name__ == '__main__':
+    args = argument_parser().parse_known_args()[0]
+    # TODO: add ltcWRB
+
+    umm = gpd.read_parquet(args.building_file)
+    umm = umm[['uID', 'geometry']]
+    assert np.all(umm.is_valid)
+
+    # Loading Urban Morphometrics (UMM)
+    metrics = ['sdbAre', 'ssbElo', 'stbOri', 'stcOri', 'ssbCCD', 'sdcAre', 'sscERI', 'sicCAR', 'mtbAli', 'mtbNDi',
+               'mtcWNe', 'mdcAre', 'ltcBuA', 'ltbIBD', 'ltcWRB']
+
+    for metric in metrics:
+        metric_values = pd.read_parquet(Path(args.morphometrics_dir) / f'{metric}.pq')
+        umm = pd.merge(umm, metric_values, on='uID', how='inner')
+
+    umm = gpd.GeoDataFrame(umm, geometry='geometry')
+    umm = umm.to_crs("EPSG:4326")
+    umm['centroid'] = umm.geometry.centroid
+    umm = gpd.GeoDataFrame(umm, geometry='centroid').drop(columns='geometry')
+
+    #Aggregation to the grid
+    grid = gpd.read_file(args.grid_file)
+    grid = grid[['geometry']]
+    grid['grid_id'] = range(1, len(grid) + 1)  # create column containing an unique raw numbering for each grid
+    grid = grid.to_crs("EPSG:4326")
+    assert np.all(grid.is_valid)
+
+    # Perform Spatial Join
+    umm_grid = gpd.sjoin(grid, umm, how='inner', predicate='intersects')
+
+    # handle missing data
+    has_missing_values = umm_grid.isnull().values.any()
+
+    umm_grid = umm_grid.dropna()
+
+    # Assuming 'joined' is your GeoDataFrame
+    # 'geometry' is the column name of the grid geometry
+    # 'grid_id' is the identifier for each grid cell
+
+    # 'variables' is a list of the variable names you want to aggregate by mean and median
+    median = ['sdcAre', 'ssbElo', 'ssbCCD', 'mtbAli', 'mtbNDi', 'ltcBuA', 'sdbAre', 'sscERI', 'sicCAR', 'mtcWNe',
+              'mdcAre', 'ltbIBD', 'ltcWRB']
+    sd = ['stbOri', 'stcOri']
+    sum = ['sdcAre']
+
+    # Set the grid geometry as the active geometry
+    print(umm_grid.columns)
+    umm_grid = umm_grid.set_geometry('geometry')
+
+    # Group by 'grid_id' and calculate median and std
+    median_values = umm_grid.groupby('grid_id')[median].median().add_prefix('md_')
+    sd_values = umm_grid.groupby('grid_id')[sd].std().fillna(0).add_prefix('sd_')
+    sum_values = umm_grid.groupby('grid_id')[sum].sum().add_prefix('sum_')
+
+    building_counts = umm_grid.groupby('grid_id').size().rename('bcount')
+    single_building_grids = building_counts[building_counts == 1]
+
+    # the NaN values are because there is only 1 building per grid
+    sd_values.isnull().sum()
+
+    # Assuming df_shp and join_df are your DataFrames and 'uID' is the common column
+    merge_stats = pd.merge(median_values, sd_values, on='grid_id', how='inner')
+    merge_stats = pd.merge(merge_stats, sum_values, on='grid_id', how='inner')
+    merge_stats = pd.merge(merge_stats, building_counts, on='grid_id', how='inner')
+
+    merge_stats.isnull().sum()
+
+    if grid.index.name != 'grid_id':
+        grid = grid.set_index('grid_id')
+
+    if merge_stats.index.name != 'grid_id':
+        merge_stats = merge_stats.set_index('grid_id')
+
+    # Perform Spatial Join
+    df_stats = pd.merge(grid, merge_stats, on='grid_id', how='inner')
+    gdf_stats = gpd.GeoDataFrame(df_stats, geometry='geometry', crs='EPSG:4326')
+
+    # if any column is duplicated
+    duplicate_columns = gdf_stats.columns[gdf_stats.columns.duplicated()]
+    print(duplicate_columns)
+
+    gdf_stats = gdf_stats.loc[:, ~gdf_stats.columns.duplicated()]
+
+    # Export to a new gpkg
+    gdf_stats.to_parquet(Path(args.output_dir) / 'morphometrics_grid.pq')
diff --git a/Sub-domains/MorphologicalInformality/Sourcecode/V3/clustering.py b/Sub-domains/MorphologicalInformality/Sourcecode/V3/clustering.py
@@ -0,0 +1,57 @@
+import geopandas as gpd
+from sklearn.cluster import KMeans
+from sklearn.preprocessing import StandardScaler
+from pathlib import Path
+
+from parsers import clustering_parser as argument_parser
+
+SEED = 7
+
+if __name__ == '__main__':
+    args = argument_parser().parse_known_args()[0]
+
+    gdf = gpd.read_parquet(args.morphometrics_file)
+    print(gdf.columns)
+    gdf.head()
+    #TODO: only cluster cells with buildings
+
+    #TODO: add md_ltcWRB
+    morph_isl = ['md_ssbCCD', 'md_mtbAli', 'md_ltcBuA', 'md_mtcWNe', 'md_ltcWRB', 'sd_stbOri', 'sd_stcOri']
+
+    #TODO add md_ltcWRB and md_ltbIBD
+    morph_sds = ['md_sdcAre', 'md_ssbElo', 'md_mtbNDi', 'md_ltbIBD', 'md_ltcBuA', 'md_sdbAre', 'md_sscERI', 'md_sicCAR',
+                 'md_mtcWNe', 'md_mdcAre', 'md_ltcWRB', 'sum_sdcAre']
+
+    gdf_isl = gdf[morph_isl]
+    gdf_sds = gdf[morph_sds]
+
+    # Initialize the StandardScaler object
+    scaler = StandardScaler()
+
+    # Scale the data by standardizing features
+    # by removing the mean and scaling to unit variance
+    data_isl = scaler.fit_transform(gdf_isl)
+    data_sds = scaler.fit_transform(gdf_sds)
+
+    # elbo Irregular Layout
+    # Calculating sum of squared distances for k in range 1 to 20
+    ssd = []
+    for k in [6, 8, 10]:
+        km = KMeans(n_clusters=k, random_state=SEED)
+        km = km.fit(data_isl)
+        ssd.append(km.inertia_)
+        gdf[f'isl_c{k}'] = km.labels_
+
+
+
+    # elbo Small, Dense Structures
+    # Calculating sum of squared distances for k in range 1 to 20
+    ssd = []
+    for k in [6, 8, 10]:
+        km = KMeans(n_clusters=k, random_state=SEED)
+        km = km.fit(data_sds)
+        ssd.append(km.inertia_)
+        gdf[f'sds_c{k}'] = km.labels_
+
+    gdf.to_parquet(Path(args.output_dir) / 'clustering.pq')
+