-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
turfpy.measurement.points_within_polygon
and turfpy.measurement.boolean_point_in_polygon
are very slow
#62
Comments
To be fair, my points have no geojson |
Thanks, @zsiegel92 for reporting this. We will surely look into this. Meanwhile, if you want to raise PR then feel free to raise it. |
@zsiegel92 - Apart from finding Point in single Polygon currently we also support multiple Polygons and MultiPolygons, but yes we will need to improve the performance of it. |
@sackh Thanks for this response! I'm glad you're looking into it. When I encounter a |
@zsiegel92 can you share the geojson for the points and polygon you test with I have made some progress so to test it I can use that geojson data. |
@zsiegel92 I have merged the changes and test with some huge data it looks good, made a new release v0.0.5 you might wanna give it a try 😉 |
Hi @omanges I look forward to trying out the new implementation! I already have a parallel set of functions that uses this method rather than my workaround, so I should be able to test it out soon. This will simplify my codebase, as I will not have any reason to import |
@omanges I just ran a small and a large trial of my current use case and got the following output: Using
|
So: it looks like your update improved the performance of this method by a lot! It's still a bit slower than using the I noticed when I updated |
@zsiegel92 Actually, we had added shapely few version earlier, but it is used for some other functions, I have one question did you increased the chunk size parameter, I bet it will give you more after results than shapely :) Give it a try. |
@zsiegel92 By default the chunk_size is 1, but can you try by setting it to 9?
|
Definitely an improvement as I increase chunk size! I increased it between 1 and 9, and results improved. What is the maximum chunk size at which you expect to see an improvement?
|
Sorry that's tough to understand - the first test was using the Shapely variant I wrote, with average time 7.48s ( |
@omanges I ran a few trials overnight and noticed a roughly constant performance for |
@zsiegel92 the chunk_size parameter means same as the chunk_size present in multiprocessor.map please refer to this document https://docs.python.org/release/2.6.6/library/multiprocessing.html#multiprocessing.pool.multiprocessing.Pool.map |
@omanges Thanks! Now I've tested for |
@zsiegel92 I think further improvement can be done by improving the turfpy.measurement.boolean_point_in_polygon, and then it works faster than shapely as well. |
And really Thank You !!! for doing such a great investigation :) |
Hey there,
I noticed that
points_within_polygon
(and the foreach-point-callbackboolean_point_in_polygon
) was fairly slow compared to the Shapely package'sshapely.geometry.Polygon.contains
method.As a test, I used a
featureCollection
calledpoints
consisting of roughly 5,000 points (in the Columbus, Ohio metro area fyi) and ageojson.Polygon
calledpolygon
(compatible withturfpy
methods).Method 1:
turfpy.measurement.points_within_polygon
Calling
turfpy.measurement.points_within_polygon(points,polygon)
takes roughly 26 seconds.Method 2:
shapely.geometry.Polygon.contains
I wrote a method that:
shapely.geometry.Polygon
calledpolygon
from ageojson.Polygon
and also creates a Pythonlist
of points of typeshapely.geometry.Point
,polygon.contains(point)
isTrue
,geojson.featureCollection
, which it returns.This takes roughly 5 seconds.
Conclusion
The two methods return the same number of points; furthermore, I plotted the points on a map in my browser (using Folium), and neither one is doing anything wrong.
Even with all the object creation (literally copying the list of points, taking up additional memory, too), Method 2 (Shapely) took <1/4 the time.
So - in case anyone intends to use this method, perhaps it can be reworked. For now, I'll post my "faster" method here.
The text was updated successfully, but these errors were encountered: