-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support ST_ENVELOPE and related ST_XMIN, etc. #116964
Conversation
Documentation preview: |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
Hi @craigtaverner, I've created a changelog YAML for you. |
Hi @craigtaverner, I've updated the changelog YAML for you. |
This will allow wrapping longitude around the dateline. TODO: Write tests and use different Evaluator for the Geo version.
Asserting in particular that the geo-wrapping over the dateline works.
* Code that wishes to modify the behaviour of the visitor can implement the <code>PointVisitor</code> interface, | ||
* or extend the existing implementations. | ||
*/ | ||
public class SpatialEnvelopeVisitor implements GeometryVisitor<Boolean, RuntimeException> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class is in lib/geo because it is needed by both ESQL
for the ST_ENVELOPE
function as well as in the compute engine for the ST_EXTENT
aggregation. It seems generic enough for libs/geo, but another choice would be to put it in the compute engine. I did not do that because there was no obvious place for generic spatial utils there. We could make one, of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this implementation might evolve quite a bit so let's leave it here for now
import static org.elasticsearch.xpack.esql.core.util.SpatialCoordinateTypes.UNSPECIFIED; | ||
|
||
@FunctionName("st_xmax") | ||
public class StXMaxTests extends AbstractScalarFunctionTestCase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a lot of copy paste between all 4 X/Y Min/Max tests. Although this is test code, it could easily be refactored out using an abstract test template.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked through the differences between two tests and the the number of lines that are actually the same is quite low. Mostly they are very similar, but not identical. I think any effort to reduce duplication will add complexity and make the tests harder to follow.
} | ||
|
||
@ConvertEvaluator(extraName = "FromWKB", warnExceptions = { IllegalArgumentException.class }) | ||
static double fromWellKnownBinary(BytesRef wkb) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 4x copy paste with 2 line differences could easily be factored out, and the JIT should take care of any performance issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about common code, where we pass in functions for the two unique calls (eg. getY and getYMax), but that creates much harder to read code, and saves only a few lines of code duplication, so I thought it was not worth it.
* The function `st_envelope` is defined in the <a href="https://www.ogc.org/standard/sfs/">OGC Simple Feature Access</a> standard. | ||
* Alternatively it is well described in PostGIS documentation at | ||
* <a href="https://postgis.net/docs/ST_ENVELOPE.html">PostGIS:ST_ENVELOPE</a>. | ||
* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: remove empty line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/** | ||
* Determine the BBOX assuming the CRS is geographic (eg WGS84) and optionally wrapping the longitude around the dateline. | ||
*/ | ||
public static Optional<Rectangle> visit(Geometry geometry, boolean wrapLongitude) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Avoid overloading here: it isn't clear which
visit
refers to Cartesian and which to geo coordinates. - Replace
boolean
with an enum or an overload (between two geo shape versions) to avoid Boolean Blindness.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use different names to specify what is geo and what is cartesian. no so fuzzy about the use of boolean here, enum is fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Used different names, by copying the version in ST_EXTENT PR. If we want to use an enum, we can make that change in that PR.
|
||
boolean isValid(); | ||
|
||
Rectangle getResult(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this can return null
, it should return Optional<Rectangle>
(Or at the very least be marked with @Nullable
, but Optional
is better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should never return null. I removed the one case where it does.
visitLongitude(maxX); | ||
} | ||
|
||
private void visitLongitude(double x) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While mathematically correct, I find this more confusing than it should be, since you're comparing both min and maxes with both min and maxes :)
|
||
@Override | ||
public Rectangle getResult() { | ||
if (Double.isInfinite(maxY)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency's sake, consider using isValid
instead of this check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed this check. The SpatialEnvelopeVisitor already calls isValid before this, and this check is an unnecessary secondary call.
} | ||
|
||
@Override | ||
public Rectangle getResult() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To allow this to be used outside of this class, consider extracting this to a static method, maybe in another class
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to consider quantization of the coordinates but I spoke to Craig offline and we will do it once we push things down to lucene,
* It also disallows invalid rectangles where minX > maxX. | ||
*/ | ||
public static class CartesianPointVisitor implements PointVisitor { | ||
private double minX = Double.POSITIVE_INFINITY; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be either be protected
or exposed via getters so they can be used by its clients, e.g., during intermediate serialization. For my money, I prefer getters, due to the whole "prefer composition to inheritance".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got this from the changes in ST_EXTENT PR.
Support ST_ENVELOPE and related ST_XMIN, etc. Based on the PostGIS equivalents: https://postgis.net/docs/ST_Envelope.html https://postgis.net/docs/ST_XMin.html https://postgis.net/docs/ST_XMax.html https://postgis.net/docs/ST_YMin.html https://postgis.net/docs/ST_YMax.html
* Support ST_ENVELOPE and related ST_XMIN, etc. (#116964) Support ST_ENVELOPE and related ST_XMIN, etc. Based on the PostGIS equivalents: https://postgis.net/docs/ST_Envelope.html https://postgis.net/docs/ST_XMin.html https://postgis.net/docs/ST_XMax.html https://postgis.net/docs/ST_YMin.html https://postgis.net/docs/ST_YMax.html * Fix off-by-one error reported in #118051
…lastic#118743) * Support ST_ENVELOPE and related ST_XMIN, etc. (elastic#116964) Support ST_ENVELOPE and related ST_XMIN, etc. Based on the PostGIS equivalents: https://postgis.net/docs/ST_Envelope.html https://postgis.net/docs/ST_XMin.html https://postgis.net/docs/ST_XMax.html https://postgis.net/docs/ST_YMin.html https://postgis.net/docs/ST_YMax.html * Fix off-by-one error reported in elastic#118051
Support ST_ENVELOPE and related ST_XMIN, etc.
Based on the PostGIS equivalents:
Using ST_XMIN and similar can be done directly to the geometry, or by first calculating the envelope so the same geometry does not need to be iterated through multiple times:
Note that when the incoming data is Geo (geo_point and geo_shape) the bounding box can wrap the dateline, in which case the final result will have
xMin > xMax
. This is the same as the default behaviour of the existinggeo_bounds
aggregation. However, in that aggregation it is possible to control this behaviour with thewrap_longitude
boolean flag. In the current work we do not yet support that flag, and always wrap the longitude for geographic data.Fixes #104875