Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PostGIS Raster integration #112

Merged
merged 22 commits into from
Jan 19, 2023
Merged

Conversation

mngr777
Copy link
Contributor

@mngr777 mngr777 commented Dec 6, 2022

This new functionality is intended to simplify projecting raster data onto H3 grid.

Two main methods of processing raster data are implemented:

  • Calculate a centroid for each pixel, determine which H3 cell corresponds to the centroid, then group by H3 index.
  • ST_Clip the raster by H3 cell boundary for each cell intersecting the raster, process each partial raster using ST_SummaryStats or ST_ValueCount.

The second method becomes more efficient as number of pixels per cell increases. There are also fallback functions for cases when H3 cell is smaller than a pixel, but in practice it would porbably be better to increase raster resolution instead.

Continuous data example:

SELECT
    h3,
    (h3_raster_summary_stats_agg(stats)).*
FROM
    rasters,
    h3_raster_summary(rast, 8)
GROUP BY 1;

       h3        | count | sum  |        mean        |       stddev       | min | max 
-----------------+-------+------+--------------------+--------------------+-----+-----
 88194ad4b7fffff |   200 | 1156 |               5.78 | 2.6384844134464758 |   1 |  10
 88194e6d33fffff |   201 | 1162 |  5.781094527363184 |  2.639508622297715 |   1 |  10
 88194e6d5dfffff |   201 | 1066 |  5.303482587064677 | 2.6263945001831512 |   1 |  10
 <...>

Discrete data example (produces one JSON object per H3 index):

WITH
    summary AS (
        SELECT h3, val, h3_raster_class_summary_item_agg(summary) AS item
        FROM
            test_rasters_bm,
            h3_raster_class_summary(rast, 8)
        GROUP BY 1, 2),
    summary_total AS (
        SELECT h3, val, item, sum((item).count) OVER (PARTITION BY h3) AS total
        FROM summary)
SELECT
    h3,
    jsonb_object_agg(
        concat('class_', val::text),
        h3_raster_class_summary_item_to_jsonb(item) -- val, count, area
            || jsonb_build_object('fraction', (item).count / total)
        ORDER BY val
    ) AS summary
FROM summary_total
GROUP BY 1;

       h3        |                                summary
-----------------+--------------------------------------------------------------------------
 88194e6f3bfffff | {"class_1": {"area": 9913.41706341505, "count": 3, "value": 1, "fraction": 0.058823529411764705}, "class_2": {"area": 13217.889417886734, "count": 4, "value": 2, "fraction": 0.0784313725490196}, <...>}
 88194e6f37fffff | {"class_1": {"area": 46262.61296260357, "count": 14, "value": 1, "fraction": 0.09090909090909091}, "class_2": {"area": 82611.80886179209, "count": 25, "value": 2, "fraction": 0.16233766233766234}, <...>}
<...>

@mngr777 mngr777 force-pushed the h3-postgis-raster branch 2 times, most recently from f2749e2 to 7f381f8 Compare December 29, 2022 10:18
@mngr777
Copy link
Contributor Author

mngr777 commented Jan 9, 2023

Some points that may be worth discussing:

  • Approximating pixel/cell area:
    Area is not calculated for each pixel or cell separately.

    • Pixel area is always the area of (1, 1) pixel for given raster, h3_raster_class_summary_clip/centroids calculate area by multiplying this value by # of pixels.
    • Cell area is approximated as an area of H3 cell at the center of the raster. h3_raster_class_summary_subpixel uses this value for area.
  • Getting the list of cells intersecting the raster:
    polygonToCells returns only cells with centroids inside the polygon. These workarounds seem to work but are not based on some exact calculations.

    • Polygon is buffered by h3_get_hexagon_edge_length_avg(resolution, 'm') * 1.3
    • If raster SRID is not 4326, then the polygon is additionally buffered by max(pixel_height, pixel_width) to account for difference between SRSs.
  • Selecting the most efficient method:
    ST_Clip method is preferred when pixels-per-cell > 250 OR (cells-per-raster > 10000 / (pixels-per-cell - 80)).
    This condition was determined experimentally, the graph below shows how much faster (in %) is clip method than centroids method for given pixels-per-cell/cells-per-pixel.
    I still don't know why increasing cells-per-raster makes clip method preferable.

graph

@mngr777
Copy link
Contributor Author

mngr777 commented Jan 10, 2023

pgxn test fails because tests for released version don't create postgis_raster extension.

@mngr777 mngr777 marked this pull request as ready for review January 10, 2023 10:27
@mngr777
Copy link
Contributor Author

mngr777 commented Jan 10, 2023

Scripts used to make the graph:
bm.tar.gz

cd batch
./run.sh
./compile.sh > data.csv
./graph.py data.csv

docs/api.md Outdated Show resolved Hide resolved
docs/api.md Outdated


## Discrete raster data
Combining summary from multiple rasters into a single JSON object for each H3 index,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a paragraph on why would you want this, and how you define a difference between Discrete/Continuous, and when to pick which.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added some examples for types of data and sample output.

Combining summary from multiple rasters into a single JSON object for each H3 index,
adding `fraction` value (fraction of H3 cell area for each value):
```
WITH
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This query frightens me as documentation, likely because it lacks sample output to grasp what it does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added sample output and a description.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interested and innocent bystander: It would be good to have an example of this but where the return value is simpler, e.g. just the value at the H3 cell centre, or the return value is the mode value, rather than these complex objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alpha-beta-soup
Thank you, that's a good idea. I've added an example of getting the most frequent value in each cell.
The other example shows an intended use case, and I also tried to show how to calculate a total # of pixels in each cell, since it's not immediately obvious.

h3_postgis/sql/install/40-rasters.sql Outdated Show resolved Hide resolved
CREATE OR REPLACE FUNCTION __h3_raster_pixel_area(rast raster)
RETURNS double precision
AS $$
SELECT ST_Area(ST_Transform(ST_PixelAsPolygon(rast, 1, 1), 4326)::geography);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do something like ST_Area(ST_ConvexHull(rast)) / (height*width)? Then sum of areas should add up to real area.

Copy link
Contributor Author

@mngr777 mngr777 Jan 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but maybe we can instead use the area of a pixel at ST_Centroid(ST_MinConvexHull(rast))?
The difference should be insignificant for a case where all pixels have data, but it would be a bit more precise in case when the pixels are clustered near the edge of the raster (e.g. all "build up" pixels are in the south).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced with __h3_raster_polygon_pixel_area, area of pixel at ST_Centroid(ST_MinConvexHull(rast)).

h3_postgis/sql/install/40-rasters.sql Outdated Show resolved Hide resolved
resolution integer)
RETURNS h3index
AS $$
SELECT h3_lat_lng_to_cell(ST_Transform(ST_Centroid(poly), 4326), resolution);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably ok here but my paranoia says that ST_PointOnSurface provides better guarantees on pointing to the inside of the polygon if it has weird shape.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that in deck.gl there is an optimization boundary that you can copy single hex within the tile if it does not contain a pentagon or there is no boundary between base cells in the tile.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here poly is min. convex hull, so the center always lies within the polygon. But why would it be bad if the center was outside of weirdly shaped polygon? Assuming H3 cell area gradient is the same across the raster (not strictly true), the most representative cell would be near the center of mass of non-nodata pixels even if this cell is outside the polygon.

Can you please elaborate on how deck.gl optimization is relevant?

h3_postgis/sql/install/40-rasters.sql Outdated Show resolved Hide resolved
Copy link
Contributor

@Komzpa Komzpa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

Copy link
Owner

@zachasme zachasme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@zachasme
Copy link
Owner

Excellent work @mngr777! Are you ready to have this PR merged?

@mngr777
Copy link
Contributor Author

mngr777 commented Jan 19, 2023

@zachasme Yes, I think it's ready to be merged, thank you.

@zachasme
Copy link
Owner

I'll merge as soon as we get the conflicts resolved. Could I ask you to rebase on main? I tried to do it myself, but I think I'd need write access to your fork.

@zachasme zachasme merged commit fbbff24 into zachasme:main Jan 19, 2023
@Komzpa Komzpa deleted the h3-postgis-raster branch February 19, 2023 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants