Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #96 #97

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Fix #96 #97

wants to merge 1 commit into from

Conversation

atsyplenkov
Copy link

Solves the issue of plotting invalid geometries.

@atsyplenkov
Copy link
Author

#96

Copy link
Owner

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

This look good on principle...is checking validity and then simplfying actually faster, though? (i.e., should we just use the topology-preserving simplify always?)

@atsyplenkov
Copy link
Author

Okay, so I might be going down a bit of a rabbit hole here, but I tried to check out the time complexity of geos_is_validgeos_simplify, and geos_simplify_preserve_topology. I just used some data that was easy to load, so my benchmarks aren’t super reproducible, since I cannot share the data in public! But I hope my point is clear as is.

I ran all three operations 5 times on polygons with different numbers of vertices, from 4 to around 600.

So, here’s the deal: both geos_is_valid and geos_simplify have a time complexity of $O(1)$, which is awesome. On the other hand, geos_simplify_preserve_topology looks like it has $O(\log n)$ at best, but honestly, I think it’s more like $O(n)$. Both $O(n)$ and $O(\log n)$ aren’t terrible, but I think it’s probably better to avoid using them unnecessarily.

  library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.8.4, PROJ 9.3.1; sf_use_s2() is TRUE
library(geos)
library(ggplot2)

# Load as geos_geometry
d2021_p <- "/raw/vector/poly_debris_2021/poly_debris_2021.shp"
d2021 <- sf::st_read(d2021_p) |> st_zm() |> as_geos_geometry()

# Sort polygons
d2021_sorted <- d2021[order(geos_num_coordinates(d2021))]

length(d2021_sorted)
#> [1] 1250
summary(geos_num_coordinates(d2021_sorted))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>    4.00   39.00   58.00   78.01   95.00  595.00

# Benchmark
df <- list()

for (i in seq_along(d2021_sorted)) {
  df[[i]] <- 
    bench::mark(
      geos_simplify = geos_simplify(d2021_sorted[i], 1),
      geos_simplify_preserve_topology = geos_simplify_preserve_topology(d2021_sorted[i], 1),
      geos_is_valid = geos_is_valid(d2021_sorted[i]),
    iterations = 5L,
    check = FALSE,
    time_unit = "ms"
  )
}



df |> 
  dplyr::bind_rows(.id = "id") |> 
  dplyr::mutate(
    n = sort(
      rep(
        geos::geos_num_coordinates(d2021_sorted),
        each = 3
      )
    )
  ) |>
  ggplot(
    aes(
      x = n,
      y = median,
      group = as.character(expression),
      color = as.character(expression)
    )
  ) +
  geom_point(alpha = 0.05) +
  geom_smooth(se = F) +
  labs(x = "Number of coordinates", y = "Time (ms)", color = "") +
  theme_minimal() +
  theme(legend.position = "bottom")
#> `geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'

Created on 2024-11-20 with reprex v2.1.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants