Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid deserializing entire parquet geometry just to determine type #898

Merged
merged 1 commit into from
May 26, 2024

Conversation

msbarry
Copy link
Contributor

@msbarry msbarry commented May 26, 2024

Slight optimization to attempt to parse the geometry type from the beginning of a WKB or WKT-encoded geometry without deserializing the whole thing. If that fails, it falls back to the old behavior of deserializing and checking the type.

Copy link

Copy link

This Branch c9455b8 Base f8e64a4
0:01:08 DEB [archive] - Tile stats:
0:01:08 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (154k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:83k)
2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
5. 14/4941/6092 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k)
6. 14/4941/6093 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (99k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:08 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  154   374   443   583   938   339   433   548   773  1.6k  2.1k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   639   712    1k  1.5k  3.1k  5.6k  3.3k  1.7k   795   936  5.6k
            landuse    0     0     0     0   548   694  1.6k  6.8k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   243   782  1.2k  5.9k    8k   24k   17k   19k   65k   48k   33k   65k
           waterway    0     0     0     0   111   118     0     0     0  3.1k  2.4k  2.1k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0    1k  3.7k  9.7k   19k   13k  8.2k  4.3k  3.4k  4.4k   19k
transportation_name    0     0     0     0     0     0   369   464  1.2k  1.8k  5.4k  4.6k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.5k   29k   85k   72k   81k   53k   30k   24k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   664   327   273   220   220   664
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.7k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   501   498   83k   83k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.7k    6k   20k   42k   85k  203k  185k  135k  114k  128k  244k  244k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k  4.8k   14k   29k   60k  149k  138k   98k   83k   91k  154k  154k
0:01:08 DEB [archive] -    Max tile: 244k (gzipped: 154k)
0:01:08 DEB [archive] -    Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic
0:01:08 DEB [archive] -     # tiles: 4,115,012
0:01:08 DEB [archive] -  # features: 5,484,250
0:01:08 INF [archive] - Finished in 18s cpu:1m6s avg:3.6
0:01:08 INF [archive] -   read    1x(3% 0.5s wait:16s)
0:01:08 INF [archive] -   encode  4x(56% 10s wait:2s)
0:01:08 INF [archive] -   write   1x(22% 4s wait:12s)
0:01:08 INF [archive] - Finished in 1m8s cpu:3m30s gc:1s avg:3.1
0:01:08 INF [archive] - FINISHED!
0:01:08 INF [archive] - 
0:01:08 INF [archive] - ----------------------------------------
0:01:08 INF [archive] - data errors:
0:01:08 INF [archive] - 	render_snap_fix_input	16,639
0:01:08 INF [archive] - 	osm_multipolygon_missing_way	389
0:01:08 INF [archive] - 	osm_boundary_missing_way	73
0:01:08 INF [archive] - 	merge_snap_fix_input	12
0:01:08 INF [archive] - 	osm_boundary_duplicate_member	2
0:01:08 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:08 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	2
0:01:08 INF [archive] - 	omt_park_area_osm_invalid_multipolygon_empty_after_fix	1
0:01:08 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:08 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:08 INF [archive] - ----------------------------------------
0:01:08 INF [archive] - 	overall          1m8s cpu:3m30s gc:1s avg:3.1
0:01:08 INF [archive] - 	lake_centerlines 3s cpu:5s avg:2
0:01:08 INF [archive] - 	  read     1x(18% 0.5s done:2s)
0:01:08 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:08 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:08 INF [archive] - 	water_polygons   15s cpu:42s avg:2.8
0:01:08 INF [archive] - 	  read     1x(43% 6s done:7s)
0:01:08 INF [archive] - 	  process  4x(27% 4s wait:4s done:5s)
0:01:08 INF [archive] - 	  write    1x(4% 0.5s wait:10s done:5s)
0:01:08 INF [archive] - 	natural_earth    12s cpu:18s avg:1.5
0:01:08 INF [archive] - 	  read     1x(52% 6s done:5s)
0:01:08 INF [archive] - 	  process  4x(7% 0.8s wait:6s done:5s)
0:01:08 INF [archive] - 	  write    1x(0% 0s wait:6s done:5s)
0:01:08 INF [archive] - 	osm_pass1        2s cpu:7s avg:3.4
0:01:08 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:08 INF [archive] - 	  parse    4x(34% 0.7s)
0:01:08 INF [archive] - 	  process  1x(69% 1s)
0:01:08 INF [archive] - 	osm_pass2        17s cpu:1m7s avg:3.9
0:01:08 INF [archive] - 	  read     1x(0% 0s wait:10s done:8s)
0:01:08 INF [archive] - 	  process  4x(75% 13s)
0:01:08 INF [archive] - 	  write    1x(2% 0.4s wait:17s)
0:01:08 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:08 INF [archive] - 	boundaries       0s cpu:0s avg:2.4
0:01:08 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:08 INF [archive] - 	sort             1s cpu:4s avg:2.7
0:01:08 INF [archive] - 	  worker  1x(48% 0.7s)
0:01:08 INF [archive] - 	archive          18s cpu:1m6s avg:3.6
0:01:08 INF [archive] - 	  read    1x(3% 0.5s wait:16s)
0:01:08 INF [archive] - 	  encode  4x(56% 10s wait:2s)
0:01:08 INF [archive] - 	  write   1x(22% 4s wait:12s)
0:01:08 INF [archive] - ----------------------------------------
0:01:08 INF [archive] - 	archive	108MB
0:01:08 INF [archive] - 	features	281MB
-rw-r--r-- 1 runner docker 84M May 26 10:55 run.jar
0:01:02 DEB [archive] - Tile stats:
0:01:02 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (154k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:83k)
2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:85k)
3. 10/308/380 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
4. 10/308/381 (136k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
5. 14/4941/6092 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:64k)
6. 14/4941/6093 (110k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (building:62k)
7. 14/4940/6092 (99k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
8. 11/616/762 (98k) https://onthegomap.github.io/planetiler-demo/#11.5/41.7057/-71.63086 (landcover:71k)
9. 14/4942/6091 (96k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
10. 11/616/761 (96k) https://onthegomap.github.io/planetiler-demo/#11.5/41.83679/-71.63086 (landcover:72k)
0:01:02 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  154   374   443   583   938   339   433   548   773  1.6k  2.1k  7.2k  6.4k  5.8k  4.5k  7.2k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   441   441   441   639   712    1k  1.5k  3.1k  5.6k  3.3k  1.7k   795   936  5.6k
            landuse    0     0     0     0   548   694  1.6k  6.8k   17k   44k   59k   50k   38k   19k   12k   59k
     transportation    0     0     0     0   243   782  1.2k  5.9k    8k   24k   17k   19k   65k   48k   33k   65k
           waterway    0     0     0     0   111   118     0     0     0  3.1k  2.4k  2.1k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0    1k  3.7k  9.7k   19k   13k  8.2k  4.3k  3.4k  4.4k   19k
transportation_name    0     0     0     0     0     0   369   464  1.2k  1.8k  5.4k  4.6k  3.9k  3.4k   18k   18k
          landcover    0     0     0     0     0     0     0  9.5k   29k   85k   72k   81k   53k   30k   24k   85k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.3k  2.8k  1.4k  1.4k   869  4.3k
         water_name    0     0     0     0     0     0     0     0     0   486   461   433   452  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   664   327   273   220   220   664
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k  2.1k    3k  3.4k  2.7k  3.4k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   501   498   83k   83k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.7k    6k   20k   42k   85k  203k  185k  135k  114k  128k  244k  244k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k  4.8k   14k   29k   60k  149k  138k   98k   83k   91k  154k  154k
0:01:02 DEB [archive] -    Max tile: 244k (gzipped: 154k)
0:01:02 DEB [archive] -    Avg tile: 5.4k (gzipped: 4k) using weighted average based on OSM traffic
0:01:02 DEB [archive] -     # tiles: 4,115,012
0:01:02 DEB [archive] -  # features: 5,484,250
0:01:02 INF [archive] - Finished in 18s cpu:1m6s avg:3.7
0:01:02 INF [archive] -   read    1x(3% 0.6s wait:16s done:1s)
0:01:02 INF [archive] -   encode  4x(55% 10s wait:2s done:1s)
0:01:02 INF [archive] -   write   1x(22% 4s wait:12s)
0:01:02 INF [archive] - Finished in 1m3s cpu:3m24s gc:1s avg:3.3
0:01:02 INF [archive] - FINISHED!
0:01:02 INF [archive] - 
0:01:02 INF [archive] - ----------------------------------------
0:01:02 INF [archive] - data errors:
0:01:02 INF [archive] - 	render_snap_fix_input	16,639
0:01:02 INF [archive] - 	osm_multipolygon_missing_way	389
0:01:02 INF [archive] - 	osm_boundary_missing_way	73
0:01:02 INF [archive] - 	merge_snap_fix_input	12
0:01:02 INF [archive] - 	osm_boundary_duplicate_member	2
0:01:02 INF [archive] - 	feature_centroid_if_convex_osm_invalid_multipolygon_empty_after_fix	2
0:01:02 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	2
0:01:02 INF [archive] - 	omt_park_area_osm_invalid_multipolygon_empty_after_fix	1
0:01:02 INF [archive] - 	omt_fix_water_before_ne_intersect	1
0:01:02 INF [archive] - 	feature_point_on_surface_osm_invalid_multipolygon_empty_after_fix	1
0:01:02 INF [archive] - ----------------------------------------
0:01:02 INF [archive] - 	overall          1m3s cpu:3m24s gc:1s avg:3.3
0:01:02 INF [archive] - 	lake_centerlines 2s cpu:5s avg:2.3
0:01:02 INF [archive] - 	  read     1x(21% 0.5s done:2s)
0:01:02 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:02 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:02 INF [archive] - 	water_polygons   15s cpu:41s avg:2.8
0:01:02 INF [archive] - 	  read     1x(43% 6s done:7s)
0:01:02 INF [archive] - 	  process  4x(26% 4s wait:4s done:5s)
0:01:02 INF [archive] - 	  write    1x(4% 0.5s wait:10s done:5s)
0:01:02 INF [archive] - 	natural_earth    6s cpu:11s avg:1.9
0:01:02 INF [archive] - 	  read     1x(96% 6s)
0:01:02 INF [archive] - 	  process  4x(13% 0.8s wait:6s)
0:01:02 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:02 INF [archive] - 	osm_pass1        2s cpu:6s avg:3.3
0:01:02 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:02 INF [archive] - 	  parse    4x(34% 0.6s)
0:01:02 INF [archive] - 	  process  1x(67% 1s)
0:01:02 INF [archive] - 	osm_pass2        18s cpu:1m10s avg:3.9
0:01:02 INF [archive] - 	  read     1x(0% 0s wait:10s done:8s)
0:01:02 INF [archive] - 	  process  4x(74% 13s)
0:01:02 INF [archive] - 	  write    1x(2% 0.4s wait:17s)
0:01:02 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:02 INF [archive] - 	boundaries       0s cpu:0s avg:1.4
0:01:02 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:02 INF [archive] - 	sort             1s cpu:3s avg:2.6
0:01:02 INF [archive] - 	  worker  1x(52% 0.7s)
0:01:02 INF [archive] - 	archive          18s cpu:1m6s avg:3.7
0:01:02 INF [archive] - 	  read    1x(3% 0.6s wait:16s done:1s)
0:01:02 INF [archive] - 	  encode  4x(55% 10s wait:2s done:1s)
0:01:02 INF [archive] - 	  write   1x(22% 4s wait:12s)
0:01:02 INF [archive] - ----------------------------------------
0:01:02 INF [archive] - 	archive	108MB
0:01:02 INF [archive] - 	features	281MB
-rw-r--r-- 1 runner docker 84M May 26 10:56 run.jar

Full logs: https://github.com/onthegomap/planetiler/actions/runs/9242696066

@msbarry msbarry merged commit 9dbd5d3 into main May 26, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant