Skip to content

Commit

Permalink
Implement API design with Postgres Backend (#8)
Browse files Browse the repository at this point in the history
* Initial project structure for API

Create fastapi application skeleton with some initial description of the summary endpoint. Includes field valdiation for aoi polygon.

* Add summary response with mocked data

* Add duckdb data ingestion

* Adapt api to read from duckdb

* Fix and cleanup h3 generation functionality

* Add tests on api

* Adapt api for storing data within postgres

Load sample data in NYC for development. Modify api to use postgres. Visualization notebook with lonboard for quick QA.

* Add configuration variable for table name

* Add unit tests for db_utils and update existing API tests

Modified app/routers/api.py to utilize get_available_fields and get_summaries from db_utils.py, updated tests/test_api.py to align with the refactored API logic, added app/utils/db_utils.py with utility functions for database operations, including get_available_fields and get_summaries, and added tests/test_db_utils.py with unit tests for the new database utility functions using pytest and unittest.mock to ensure functionality without a real database connection.

* Fix definition of environment variables error

Add notebook visualization that includes the available fields endpoint. Fix discovered bug in order of environment variables defined before being loaded from env file.

* Update field validation to use geojson_pydantic

Required shifting some types around for the aoi. Shapely is still used in h3_utils.py.

* Remove error handling on HTTPException

* Format with black and remove print statements

* Add pydantic settings

* Update db_utils tests to match new summaries output that reflects df structure

* Add github action CI for space2stats api
  • Loading branch information
zacdezgeo authored Jul 19, 2024
1 parent 2f219d9 commit 7b6f354
Show file tree
Hide file tree
Showing 25 changed files with 1,228 additions and 1 deletion.
35 changes: 35 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Run Tests

on: [push, pull_request]

jobs:
test:
runs-on: ubuntu-latest

env:
DB_HOST: localhost
DB_PORT: 5432
DB_NAME: mydatabase
DB_USER: myuser
DB_PASSWORD: mypassword
DB_TABLE_NAME: space2stats

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.11

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r space2stats_api/requirements.txt
- name: Set PYTHONPATH
run: echo "PYTHONPATH=$(pwd)/space2stats_api" >> $GITHUB_ENV

- name: Run tests
run: pytest space2stats_api/tests
9 changes: 8 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -93,4 +93,11 @@ target/
_build/

# python-dotenv
.env
.env
wb_aws.env
db.env

# data
*.parquet
*.duckdb
.pgdata
17 changes: 17 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
version: '3'

services:
database:
image: ghcr.io/stac-utils/pgstac:v0.8.5
environment:
- POSTGRES_USER=username
- POSTGRES_PASSWORD=password
- POSTGRES_DB=postgis
- PGUSER=username
- PGPASSWORD=password
- PGDATABASE=postgis
ports:
- "${MY_DOCKER_IP:-127.0.0.1}:5439:5432"
command: postgres -N 500
volumes:
- ./.pgdata:/var/lib/postgresql/data
11 changes: 11 additions & 0 deletions notebooks/summary_data.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
hex_id,fields
862a1008fffffff,"{'__index_level_0__': 2436230, 'ogc_fid': 11, 'sum_pop_f_0_2020': 3355.96850585938, 'sum_pop_f_10_2020': 12371.3955078125, 'sum_pop_f_15_2020': 15563.8896484375, 'sum_pop_f_1_2020': 12494.43359375, 'sum_pop_f_20_2020': 30224.130859375, 'sum_pop_f_25_2020': 42427.28125, 'sum_pop_f_30_2020': 34711.5625, 'sum_pop_f_35_2020': 25574.31640625, 'sum_pop_f_40_2020': 20973.458984375, 'sum_pop_f_45_2020': 18116.025390625, 'sum_pop_f_50_2020': 18691.546875, 'sum_pop_f_55_2020': 21246.267578125, 'sum_pop_f_5_2020': 12426.166015625, 'sum_pop_f_60_2020': 22672.314453125, 'sum_pop_f_65_2020': 20404.287109375, 'sum_pop_f_70_2020': 17031.431640625, 'sum_pop_f_75_2020': 11438.015625, 'sum_pop_f_80_2020': 17598.7109375, 'sum_pop_m_0_2020': 3499.36499023438, 'sum_pop_m_10_2020': 12757.32421875, 'sum_pop_m_15_2020': 14690.669921875, 'sum_pop_m_1_2020': 13028.8857421875, 'sum_pop_m_20_2020': 24383.478515625, 'sum_pop_m_25_2020': 36570.5, 'sum_pop_m_30_2020': 33656.46875, 'sum_pop_m_35_2020': 25711.21875, 'sum_pop_m_40_2020': 21449.7265625, 'sum_pop_m_45_2020': 18393.3671875, 'sum_pop_m_50_2020': 17531.7890625, 'sum_pop_m_55_2020': 18519.333984375, 'sum_pop_m_5_2020': 12790.00390625, 'sum_pop_m_60_2020': 17991.046875, 'sum_pop_m_65_2020': 15532.0927734375, 'sum_pop_m_70_2020': 12730.666015625, 'sum_pop_m_75_2020': 8303.8662109375, 'sum_pop_m_80_2020': 9250.68359375}"
862a100d7ffffff,"{'__index_level_0__': 2436238, 'ogc_fid': 19, 'sum_pop_f_0_2020': 3169.98120117188, 'sum_pop_f_10_2020': 11957.44140625, 'sum_pop_f_15_2020': 14855.185546875, 'sum_pop_f_1_2020': 11801.9931640625, 'sum_pop_f_20_2020': 27791.64453125, 'sum_pop_f_25_2020': 38212.46875, 'sum_pop_f_30_2020': 31441.3046875, 'sum_pop_f_35_2020': 23516.759765625, 'sum_pop_f_40_2020': 19418.1640625, 'sum_pop_f_45_2020': 16919.677734375, 'sum_pop_f_50_2020': 17537.130859375, 'sum_pop_f_55_2020': 19871.884765625, 'sum_pop_f_5_2020': 11981.1435546875, 'sum_pop_f_60_2020': 21095.8203125, 'sum_pop_f_65_2020': 18827.03125, 'sum_pop_f_70_2020': 15865.4326171875, 'sum_pop_f_75_2020': 10600.365234375, 'sum_pop_f_80_2020': 16381.328125, 'sum_pop_m_0_2020': 3306.43798828125, 'sum_pop_m_10_2020': 12306.724609375, 'sum_pop_m_15_2020': 14101.4326171875, 'sum_pop_m_1_2020': 12310.578125, 'sum_pop_m_20_2020': 22694.451171875, 'sum_pop_m_25_2020': 32823.91015625, 'sum_pop_m_30_2020': 30206.892578125, 'sum_pop_m_35_2020': 23337.458984375, 'sum_pop_m_40_2020': 19612.8359375, 'sum_pop_m_45_2020': 16889.306640625, 'sum_pop_m_50_2020': 16264.783203125, 'sum_pop_m_55_2020': 17167.7890625, 'sum_pop_m_5_2020': 12373.455078125, 'sum_pop_m_60_2020': 16715.095703125, 'sum_pop_m_65_2020': 14265.3515625, 'sum_pop_m_70_2020': 11801.837890625, 'sum_pop_m_75_2020': 7603.23095703125, 'sum_pop_m_80_2020': 8574.2255859375}"
862a100dfffffff,"{'__index_level_0__': 2436239, 'ogc_fid': 20, 'sum_pop_f_0_2020': 5147.3330078125, 'sum_pop_f_10_2020': 21324.5234375, 'sum_pop_f_15_2020': 22045.63671875, 'sum_pop_f_1_2020': 19163.771484375, 'sum_pop_f_20_2020': 27736.291015625, 'sum_pop_f_25_2020': 35799.5234375, 'sum_pop_f_30_2020': 33238.703125, 'sum_pop_f_35_2020': 27172.52734375, 'sum_pop_f_40_2020': 23106.755859375, 'sum_pop_f_45_2020': 21305.44921875, 'sum_pop_f_50_2020': 22005.82421875, 'sum_pop_f_55_2020': 23561.41796875, 'sum_pop_f_5_2020': 20879.88671875, 'sum_pop_f_60_2020': 23332.68359375, 'sum_pop_f_65_2020': 18850.859375, 'sum_pop_f_70_2020': 17307.85546875, 'sum_pop_f_75_2020': 11436.3251953125, 'sum_pop_f_80_2020': 17651.9609375, 'sum_pop_m_0_2020': 5364.07861328125, 'sum_pop_m_10_2020': 22061.70703125, 'sum_pop_m_15_2020': 22792.357421875, 'sum_pop_m_1_2020': 19971.615234375, 'sum_pop_m_20_2020': 27098.421875, 'sum_pop_m_25_2020': 32599.54296875, 'sum_pop_m_30_2020': 30635.79296875, 'sum_pop_m_35_2020': 24916.66796875, 'sum_pop_m_40_2020': 20631.275390625, 'sum_pop_m_45_2020': 18518.20703125, 'sum_pop_m_50_2020': 18673.708984375, 'sum_pop_m_55_2020': 19300.26953125, 'sum_pop_m_5_2020': 21773.6953125, 'sum_pop_m_60_2020': 18752.81640625, 'sum_pop_m_65_2020': 14246.966796875, 'sum_pop_m_70_2020': 12312.962890625, 'sum_pop_m_75_2020': 7566.3466796875, 'sum_pop_m_80_2020': 8843.8251953125}"
862a10707ffffff,"{'__index_level_0__': 2436390, 'ogc_fid': 49, 'sum_pop_f_0_2020': 1031.58679199219, 'sum_pop_f_10_2020': 3744.32397460938, 'sum_pop_f_15_2020': 3917.22143554688, 'sum_pop_f_1_2020': 3840.6474609375, 'sum_pop_f_20_2020': 5425.830078125, 'sum_pop_f_25_2020': 8483.66796875, 'sum_pop_f_30_2020': 8123.6982421875, 'sum_pop_f_35_2020': 6064.3916015625, 'sum_pop_f_40_2020': 4933.3759765625, 'sum_pop_f_45_2020': 4362.7236328125, 'sum_pop_f_50_2020': 4338.94580078125, 'sum_pop_f_55_2020': 4514.39990234375, 'sum_pop_f_5_2020': 3701.62890625, 'sum_pop_f_60_2020': 4312.689453125, 'sum_pop_f_65_2020': 3729.978515625, 'sum_pop_f_70_2020': 3232.61254882812, 'sum_pop_f_75_2020': 2183.283203125, 'sum_pop_f_80_2020': 3135.68408203125, 'sum_pop_m_0_2020': 1086.77282714844, 'sum_pop_m_10_2020': 3941.2705078125, 'sum_pop_m_15_2020': 4346.45556640625, 'sum_pop_m_1_2020': 4046.28857421875, 'sum_pop_m_20_2020': 5813.85107421875, 'sum_pop_m_25_2020': 8800.37109375, 'sum_pop_m_30_2020': 8521.666015625, 'sum_pop_m_35_2020': 6522.4853515625, 'sum_pop_m_40_2020': 5050.24072265625, 'sum_pop_m_45_2020': 4325.6865234375, 'sum_pop_m_50_2020': 4091.5224609375, 'sum_pop_m_55_2020': 4049.44775390625, 'sum_pop_m_5_2020': 3862.1474609375, 'sum_pop_m_60_2020': 3625.97485351562, 'sum_pop_m_65_2020': 2958.13623046875, 'sum_pop_m_70_2020': 2380.19506835938, 'sum_pop_m_75_2020': 1527.44067382812, 'sum_pop_m_80_2020': 1581.6474609375}"
862a1070fffffff,"{'__index_level_0__': 2436391, 'ogc_fid': 50, 'sum_pop_f_0_2020': 285.267578125, 'sum_pop_f_10_2020': 1234.03344726562, 'sum_pop_f_15_2020': 1272.86743164062, 'sum_pop_f_1_2020': 1062.06518554688, 'sum_pop_f_20_2020': 1501.95458984375, 'sum_pop_f_25_2020': 1971.18774414062, 'sum_pop_f_30_2020': 1970.33056640625, 'sum_pop_f_35_2020': 1658.791015625, 'sum_pop_f_40_2020': 1498.34887695312, 'sum_pop_f_45_2020': 1380.0244140625, 'sum_pop_f_50_2020': 1413.89428710938, 'sum_pop_f_55_2020': 1476.39672851562, 'sum_pop_f_5_2020': 1194.50146484375, 'sum_pop_f_60_2020': 1449.78100585938, 'sum_pop_f_65_2020': 1220.18408203125, 'sum_pop_f_70_2020': 1001.80200195312, 'sum_pop_f_75_2020': 692.449951171875, 'sum_pop_f_80_2020': 1021.06573486328, 'sum_pop_m_0_2020': 301.492889404297, 'sum_pop_m_10_2020': 1309.16052246094, 'sum_pop_m_15_2020': 1371.35522460938, 'sum_pop_m_1_2020': 1122.52270507812, 'sum_pop_m_20_2020': 1590.68212890625, 'sum_pop_m_25_2020': 2043.75939941406, 'sum_pop_m_30_2020': 1984.005859375, 'sum_pop_m_35_2020': 1687.57556152344, 'sum_pop_m_40_2020': 1456.80505371094, 'sum_pop_m_45_2020': 1328.02758789062, 'sum_pop_m_50_2020': 1317.73278808594, 'sum_pop_m_55_2020': 1337.02490234375, 'sum_pop_m_5_2020': 1246.01293945312, 'sum_pop_m_60_2020': 1250.81298828125, 'sum_pop_m_65_2020': 1011.77893066406, 'sum_pop_m_70_2020': 772.69921875, 'sum_pop_m_75_2020': 493.368988037109, 'sum_pop_m_80_2020': 513.383544921875}"
862a10727ffffff,"{'__index_level_0__': 2436394, 'ogc_fid': 53, 'sum_pop_f_0_2020': 1815.30310058594, 'sum_pop_f_10_2020': 6547.26708984375, 'sum_pop_f_15_2020': 7142.46240234375, 'sum_pop_f_1_2020': 6758.46142578125, 'sum_pop_f_20_2020': 10860.0166015625, 'sum_pop_f_25_2020': 16565.375, 'sum_pop_f_30_2020': 15229.623046875, 'sum_pop_f_35_2020': 11273.6337890625, 'sum_pop_f_40_2020': 9142.7578125, 'sum_pop_f_45_2020': 8026.2275390625, 'sum_pop_f_50_2020': 8047.564453125, 'sum_pop_f_55_2020': 8562.94921875, 'sum_pop_f_5_2020': 6503.41015625, 'sum_pop_f_60_2020': 8420.705078125, 'sum_pop_f_65_2020': 7366.7041015625, 'sum_pop_f_70_2020': 6323.734375, 'sum_pop_f_75_2020': 4252.55859375, 'sum_pop_f_80_2020': 6203.60986328125, 'sum_pop_m_0_2020': 1909.22351074219, 'sum_pop_m_10_2020': 6867.8671875, 'sum_pop_m_15_2020': 7672.0263671875, 'sum_pop_m_1_2020': 7108.4482421875, 'sum_pop_m_20_2020': 10829.1796875, 'sum_pop_m_25_2020': 16439.6875, 'sum_pop_m_30_2020': 15721.458984375, 'sum_pop_m_35_2020': 11972.9853515625, 'sum_pop_m_40_2020': 9380.048828125, 'sum_pop_m_45_2020': 8014.259765625, 'sum_pop_m_50_2020': 7586.525390625, 'sum_pop_m_55_2020': 7627.609375, 'sum_pop_m_5_2020': 6769.341796875, 'sum_pop_m_60_2020': 6963.20947265625, 'sum_pop_m_65_2020': 5769.19580078125, 'sum_pop_m_70_2020': 4668.7470703125, 'sum_pop_m_75_2020': 3006.267578125, 'sum_pop_m_80_2020': 3161.91650390625}"
862a1072fffffff,"{'__index_level_0__': 2436395, 'ogc_fid': 54, 'sum_pop_f_0_2020': 1863.88208007812, 'sum_pop_f_10_2020': 7031.26171875, 'sum_pop_f_15_2020': 8313.15625, 'sum_pop_f_1_2020': 6939.32373046875, 'sum_pop_f_20_2020': 14435.01171875, 'sum_pop_f_25_2020': 20009.3125, 'sum_pop_f_30_2020': 16851.1328125, 'sum_pop_f_35_2020': 12630.2119140625, 'sum_pop_f_40_2020': 10353.865234375, 'sum_pop_f_45_2020': 9068.0224609375, 'sum_pop_f_50_2020': 9348.8232421875, 'sum_pop_f_55_2020': 10468.044921875, 'sum_pop_f_5_2020': 7024.83740234375, 'sum_pop_f_60_2020': 10943.9521484375, 'sum_pop_f_65_2020': 9579.228515625, 'sum_pop_f_70_2020': 8196.7197265625, 'sum_pop_f_75_2020': 5452.208984375, 'sum_pop_f_80_2020': 8342.0283203125, 'sum_pop_m_0_2020': 1944.47668457031, 'sum_pop_m_10_2020': 7258.96826171875, 'sum_pop_m_15_2020': 8105.0302734375, 'sum_pop_m_1_2020': 7239.70361328125, 'sum_pop_m_20_2020': 12277.5703125, 'sum_pop_m_25_2020': 17537.927734375, 'sum_pop_m_30_2020': 16196.8681640625, 'sum_pop_m_35_2020': 12452.357421875, 'sum_pop_m_40_2020': 10254.236328125, 'sum_pop_m_45_2020': 8837.01171875, 'sum_pop_m_50_2020': 8528.3359375, 'sum_pop_m_55_2020': 8954.818359375, 'sum_pop_m_5_2020': 7267.75244140625, 'sum_pop_m_60_2020': 8682.8642578125, 'sum_pop_m_65_2020': 7253.91015625, 'sum_pop_m_70_2020': 6027.0146484375, 'sum_pop_m_75_2020': 3859.2119140625, 'sum_pop_m_80_2020': 4310.0849609375}"
862a10757ffffff,"{'__index_level_0__': 2436399, 'ogc_fid': 58, 'sum_pop_f_0_2020': 834.680480957031, 'sum_pop_f_10_2020': 3883.02319335938, 'sum_pop_f_15_2020': 3978.943359375, 'sum_pop_f_1_2020': 3107.55615234375, 'sum_pop_f_20_2020': 4433.38232421875, 'sum_pop_f_25_2020': 5126.501953125, 'sum_pop_f_30_2020': 5110.86669921875, 'sum_pop_f_35_2020': 4612.77783203125, 'sum_pop_f_40_2020': 4320.1806640625, 'sum_pop_f_45_2020': 4065.07666015625, 'sum_pop_f_50_2020': 4238.0830078125, 'sum_pop_f_55_2020': 4474.5029296875, 'sum_pop_f_5_2020': 3739.5732421875, 'sum_pop_f_60_2020': 4464.12255859375, 'sum_pop_f_65_2020': 3650.60009765625, 'sum_pop_f_70_2020': 3047.7119140625, 'sum_pop_f_75_2020': 2093.39038085938, 'sum_pop_f_80_2020': 3182.42626953125, 'sum_pop_m_0_2020': 878.235229492188, 'sum_pop_m_10_2020': 4092.75439453125, 'sum_pop_m_15_2020': 4156.81689453125, 'sum_pop_m_1_2020': 3269.85815429688, 'sum_pop_m_20_2020': 4496.69580078125, 'sum_pop_m_25_2020': 4984.02490234375, 'sum_pop_m_30_2020': 4775.9384765625, 'sum_pop_m_35_2020': 4325.20068359375, 'sum_pop_m_40_2020': 3944.751953125, 'sum_pop_m_45_2020': 3706.37255859375, 'sum_pop_m_50_2020': 3792.64916992188, 'sum_pop_m_55_2020': 3921.2607421875, 'sum_pop_m_5_2020': 3899.29052734375, 'sum_pop_m_60_2020': 3796.07446289062, 'sum_pop_m_65_2020': 2990.9345703125, 'sum_pop_m_70_2020': 2328.72338867188, 'sum_pop_m_75_2020': 1461.92907714844, 'sum_pop_m_80_2020': 1593.95361328125}"
862a10767ffffff,"{'__index_level_0__': 2436401, 'ogc_fid': 60, 'sum_pop_f_0_2020': 3148.86889648438, 'sum_pop_f_10_2020': 13023.1484375, 'sum_pop_f_15_2020': 13454.26953125, 'sum_pop_f_1_2020': 11723.392578125, 'sum_pop_f_20_2020': 16926.83203125, 'sum_pop_f_25_2020': 21846.34375, 'sum_pop_f_30_2020': 20248.33984375, 'sum_pop_f_35_2020': 16524.6953125, 'sum_pop_f_40_2020': 14018.3984375, 'sum_pop_f_45_2020': 12919.609375, 'sum_pop_f_50_2020': 13344.369140625, 'sum_pop_f_55_2020': 14300.04296875, 'sum_pop_f_5_2020': 12757.994140625, 'sum_pop_f_60_2020': 14169.640625, 'sum_pop_f_65_2020': 11414.6669921875, 'sum_pop_f_70_2020': 10501.298828125, 'sum_pop_f_75_2020': 6928.44677734375, 'sum_pop_f_80_2020': 10685.734375, 'sum_pop_m_0_2020': 3280.53686523438, 'sum_pop_m_10_2020': 13470.740234375, 'sum_pop_m_15_2020': 13894.583984375, 'sum_pop_m_1_2020': 12214.142578125, 'sum_pop_m_20_2020': 16504.916015625, 'sum_pop_m_25_2020': 19838.443359375, 'sum_pop_m_30_2020': 18610.52734375, 'sum_pop_m_35_2020': 15098.4765625, 'sum_pop_m_40_2020': 12466.5986328125, 'sum_pop_m_45_2020': 11185.37109375, 'sum_pop_m_50_2020': 11283.837890625, 'sum_pop_m_55_2020': 11677.3203125, 'sum_pop_m_5_2020': 13300.9453125, 'sum_pop_m_60_2020': 11366.205078125, 'sum_pop_m_65_2020': 8611.40625, 'sum_pop_m_70_2020': 7453.0439453125, 'sum_pop_m_75_2020': 4575.9609375, 'sum_pop_m_80_2020': 5342.28955078125}"
862a10777ffffff,"{'__index_level_0__': 2436403, 'ogc_fid': 62, 'sum_pop_f_0_2020': 3401.08959960938, 'sum_pop_f_10_2020': 14066.287109375, 'sum_pop_f_15_2020': 14531.94140625, 'sum_pop_f_1_2020': 12662.421875, 'sum_pop_f_20_2020': 18282.650390625, 'sum_pop_f_25_2020': 23596.2109375, 'sum_pop_f_30_2020': 21870.20703125, 'sum_pop_f_35_2020': 17848.302734375, 'sum_pop_f_40_2020': 15141.2568359375, 'sum_pop_f_45_2020': 13954.455078125, 'sum_pop_f_50_2020': 14413.2373046875, 'sum_pop_f_55_2020': 15445.458984375, 'sum_pop_f_5_2020': 13779.89453125, 'sum_pop_f_60_2020': 15304.611328125, 'sum_pop_f_65_2020': 12328.96875, 'sum_pop_f_70_2020': 11342.439453125, 'sum_pop_f_75_2020': 7483.4072265625, 'sum_pop_f_80_2020': 11541.6484375, 'sum_pop_m_0_2020': 3543.30419921875, 'sum_pop_m_10_2020': 14549.73046875, 'sum_pop_m_15_2020': 15007.5234375, 'sum_pop_m_1_2020': 13192.482421875, 'sum_pop_m_20_2020': 17826.94140625, 'sum_pop_m_25_2020': 21427.48046875, 'sum_pop_m_30_2020': 20101.208984375, 'sum_pop_m_35_2020': 16307.8466796875, 'sum_pop_m_40_2020': 13465.1591796875, 'sum_pop_m_45_2020': 12081.306640625, 'sum_pop_m_50_2020': 12187.66015625, 'sum_pop_m_55_2020': 12612.66015625, 'sum_pop_m_5_2020': 14366.3359375, 'sum_pop_m_60_2020': 12276.625, 'sum_pop_m_65_2020': 9301.1689453125, 'sum_pop_m_70_2020': 8050.02490234375, 'sum_pop_m_75_2020': 4942.490234375, 'sum_pop_m_80_2020': 5770.201171875}"
621 changes: 621 additions & 0 deletions notebooks/visualize_nyc_sample.ipynb

Large diffs are not rendered by default.

11 changes: 11 additions & 0 deletions postgres/chunk_parquet.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
import pandas as pd


df = pd.read_parquet('space2stats.parquet')
chunk_size = 100000 # Number of rows per chunk

for i in range(0, len(df), chunk_size):
chunk = df.iloc[i:i + chunk_size]
chunk.to_parquet(f'parquet_chunks/space2stats_part_{i // chunk_size}.parquet')

print("Parquet file split into smaller chunks.")
20 changes: 20 additions & 0 deletions postgres/download_parquet.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
!/bin/bash

# Load environment variables from wb_aws.env
source wb_aws.env

# S3 and file configuration
S3_BUCKET="wbg-geography01"
PARQUET_FILE="Space2Stats/parquet/GLOBAL/combined_population.parquet"
LOCAL_PARQUET_FILE="space2stats.parquet"

# PostgreSQL configuration
DB_HOST="${MY_DOCKER_IP:-127.0.0.1}"
DB_PORT=5439
DB_NAME="postgis"
DB_USER="username"
DB_PASSWORD="password"

# Download Parquet file from S3
echo "Downloading Parquet file from S3..."
aws s3 cp --quiet s3://$S3_BUCKET/$PARQUET_FILE $LOCAL_PARQUET_FILE
45 changes: 45 additions & 0 deletions postgres/load_nyc_sample.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
#!/bin/bash

# Database connection details
DB_HOST="localhost"
DB_PORT="5439"
DB_NAME="postgis"
DB_USER="username"
DB_PASSWORD="password"

# Path to the sample Parquet file
PARQUET_FILE="nyc_sample.parquet"

# Name of the target table
TABLE_NAME="space2stats_nyc_sample"

# Check if the table exists
TABLE_EXISTS=$(psql -h $DB_HOST -p $DB_PORT -d $DB_NAME -U $DB_USER -tAc "SELECT EXISTS (SELECT FROM information_schema.tables WHERE table_schema='public' AND table_name='$TABLE_NAME');")

echo "Importing $PARQUET_FILE..."

if [ "$TABLE_EXISTS" = "t" ]; then
# Table exists, append data
ogr2ogr -f "PostgreSQL" \
PG:"host=$DB_HOST port=$DB_PORT dbname=$DB_NAME user=$DB_USER password=$DB_PASSWORD" \
"$PARQUET_FILE" \
-nln $TABLE_NAME \
-append
else
# Table does not exist, create table and import data
ogr2ogr -f "PostgreSQL" \
PG:"host=$DB_HOST port=$DB_PORT dbname=$DB_NAME user=$DB_USER password=$DB_PASSWORD" \
"$PARQUET_FILE" \
-nln $TABLE_NAME

TABLE_EXISTS="t"
fi

if [ $? -ne 0 ]; then
echo "Failed to import $PARQUET_FILE"
exit 1
fi

echo "Successfully imported $PARQUET_FILE"

echo "The Parquet file has been imported."
49 changes: 49 additions & 0 deletions postgres/load_parquet_chunks.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
#!/bin/bash

# Database connection details
DB_HOST="localhost"
DB_PORT="5439"
DB_NAME="postgis"
DB_USER="username"
DB_PASSWORD="password"

# Directory containing the Parquet chunks
CHUNKS_DIR="parquet_chunks"

# Name of the target table
TABLE_NAME="space2stats"

# Flag to check if the table exists
TABLE_EXISTS=$(psql -h $DB_HOST -p $DB_PORT -d $DB_NAME -U $DB_USER -tAc "SELECT EXISTS (SELECT FROM information_schema.tables WHERE table_schema='public' AND table_name='$TABLE_NAME');")

# Loop through each Parquet file in the chunks directory
for PARQUET_FILE in "$CHUNKS_DIR"/*.parquet;
do
echo "Importing $PARQUET_FILE..."

if [ "$TABLE_EXISTS" = "t" ]; then
# Table exists, append data
ogr2ogr -f "PostgreSQL" \
PG:"host=$DB_HOST port=$DB_PORT dbname=$DB_NAME user=$DB_USER password=$DB_PASSWORD" \
"$PARQUET_FILE" \
-nln $TABLE_NAME \
-append
else
# Table does not exist, create table and import data
ogr2ogr -f "PostgreSQL" \
PG:"host=$DB_HOST port=$DB_PORT dbname=$DB_NAME user=$DB_USER password=$DB_PASSWORD" \
"$PARQUET_FILE" \
-nln $TABLE_NAME

TABLE_EXISTS="t"
fi

if [ $? -ne 0 ]; then
echo "Failed to import $PARQUET_FILE"
exit 1
fi

echo "Successfully imported $PARQUET_FILE"
done

echo "All Parquet chunks have been imported."
29 changes: 29 additions & 0 deletions postgres/nyc_sample.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
import pandas as pd
import h3


# Load the full dataset
df = pd.read_parquet('space2stats.parquet')

# Define the bounding box for New York City (approximate values) as a GeoJSON polygon
nyc_polygon = {
"type": "Polygon",
"coordinates": [[
[-74.259090, 40.477399],
[-73.700272, 40.477399],
[-73.700272, 40.917577],
[-74.259090, 40.917577],
[-74.259090, 40.477399]
]]
}

# Generate H3 indices for the bounding box using polyfill
resolution = 6
nyc_hexagons = h3.polyfill(nyc_polygon, resolution, geo_json_conformant=True)

# Filter the dataframe for New York City H3 indices
nyc_df = df[df['hex_id'].isin(nyc_hexagons)]

nyc_df.to_parquet('nyc_sample.parquet')

print("Filtered file for New York City.")
Empty file added space2stats_api/__init__.py
Empty file.
13 changes: 13 additions & 0 deletions space2stats_api/app/main.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
from fastapi import FastAPI

from .routers import api


app = FastAPI()

app.include_router(api.router)


@app.get("/")
def read_root():
return {"message": "Welcome to Space2Stats!"}
Empty file.
Empty file.
Empty file.
Loading

0 comments on commit 7b6f354

Please sign in to comment.