Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ST_ENVELOPE and related ST_XMIN, etc. #116964

Merged
merged 24 commits into from
Dec 4, 2024

Conversation

craigtaverner
Copy link
Contributor

@craigtaverner craigtaverner commented Nov 18, 2024

Support ST_ENVELOPE and related ST_XMIN, etc.

Based on the PostGIS equivalents:

Screenshot 2024-11-18 at 19 17 49

Using ST_XMIN and similar can be done directly to the geometry, or by first calculating the envelope so the same geometry does not need to be iterated through multiple times:

Screenshot 2024-11-18 at 19 20 54

Note that when the incoming data is Geo (geo_point and geo_shape) the bounding box can wrap the dateline, in which case the final result will have xMin > xMax. This is the same as the default behaviour of the existing geo_bounds aggregation. However, in that aggregation it is possible to control this behaviour with the wrap_longitude boolean flag. In the current work we do not yet support that flag, and always wrap the longitude for geographic data.

Fixes #104875

@craigtaverner craigtaverner added >feature :Analytics/Geo Indexing, search aggregations of geo points and shapes Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) :Analytics/ES|QL AKA ESQL labels Nov 18, 2024
Copy link
Contributor

Documentation preview:

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've created a changelog YAML for you.

@elasticsearchmachine
Copy link
Collaborator

Hi @craigtaverner, I've updated the changelog YAML for you.

* Code that wishes to modify the behaviour of the visitor can implement the <code>PointVisitor</code> interface,
* or extend the existing implementations.
*/
public class SpatialEnvelopeVisitor implements GeometryVisitor<Boolean, RuntimeException> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is in lib/geo because it is needed by both ESQL for the ST_ENVELOPE function as well as in the compute engine for the ST_EXTENT aggregation. It seems generic enough for libs/geo, but another choice would be to put it in the compute engine. I did not do that because there was no obvious place for generic spatial utils there. We could make one, of course.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this implementation might evolve quite a bit so let's leave it here for now

import static org.elasticsearch.xpack.esql.core.util.SpatialCoordinateTypes.UNSPECIFIED;

@FunctionName("st_xmax")
public class StXMaxTests extends AbstractScalarFunctionTestCase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of copy paste between all 4 X/Y Min/Max tests. Although this is test code, it could easily be refactored out using an abstract test template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through the differences between two tests and the the number of lines that are actually the same is quite low. Mostly they are very similar, but not identical. I think any effort to reduce duplication will add complexity and make the tests harder to follow.

}

@ConvertEvaluator(extraName = "FromWKB", warnExceptions = { IllegalArgumentException.class })
static double fromWellKnownBinary(BytesRef wkb) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This 4x copy paste with 2 line differences could easily be factored out, and the JIT should take care of any performance issues.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about common code, where we pass in functions for the two unique calls (eg. getY and getYMax), but that creates much harder to read code, and saves only a few lines of code duplication, so I thought it was not worth it.

* The function `st_envelope` is defined in the <a href="https://www.ogc.org/standard/sfs/">OGC Simple Feature Access</a> standard.
* Alternatively it is well described in PostGIS documentation at
* <a href="https://postgis.net/docs/ST_ENVELOPE.html">PostGIS:ST_ENVELOPE</a>.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: remove empty line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

/**
* Determine the BBOX assuming the CRS is geographic (eg WGS84) and optionally wrapping the longitude around the dateline.
*/
public static Optional<Rectangle> visit(Geometry geometry, boolean wrapLongitude) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Avoid overloading here: it isn't clear which visit refers to Cartesian and which to geo coordinates.
  2. Replace boolean with an enum or an overload (between two geo shape versions) to avoid Boolean Blindness.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would use different names to specify what is geo and what is cartesian. no so fuzzy about the use of boolean here, enum is fine too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used different names, by copying the version in ST_EXTENT PR. If we want to use an enum, we can make that change in that PR.


boolean isValid();

Rectangle getResult();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this can return null, it should return Optional<Rectangle> (Or at the very least be marked with @Nullable, but Optional is better)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should never return null. I removed the one case where it does.

visitLongitude(maxX);
}

private void visitLongitude(double x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While mathematically correct, I find this more confusing than it should be, since you're comparing both min and maxes with both min and maxes :)


@Override
public Rectangle getResult() {
if (Double.isInfinite(maxY)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency's sake, consider using isValid instead of this check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this check. The SpatialEnvelopeVisitor already calls isValid before this, and this check is an unnecessary secondary call.

}

@Override
public Rectangle getResult() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To allow this to be used outside of this class, consider extracting this to a static method, maybe in another class.

Copy link
Contributor

@iverase iverase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to consider quantization of the coordinates but I spoke to Craig offline and we will do it once we push things down to lucene,

* It also disallows invalid rectangles where minX > maxX.
*/
public static class CartesianPointVisitor implements PointVisitor {
private double minX = Double.POSITIVE_INFINITY;
Copy link
Contributor

@GalLalouche GalLalouche Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be either be protected or exposed via getters so they can be used by its clients, e.g., during intermediate serialization. For my money, I prefer getters, due to the whole "prefer composition to inheritance".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got this from the changes in ST_EXTENT PR.

@craigtaverner craigtaverner added auto-backport Automatically create backport pull requests when merged v8.18.0 labels Dec 4, 2024
@craigtaverner craigtaverner merged commit c7e985c into elastic:main Dec 4, 2024
16 checks passed
@astefan astefan added backport pending and removed auto-backport Automatically create backport pull requests when merged labels Dec 5, 2024
elasticsearchmachine pushed a commit that referenced this pull request Dec 16, 2024
maxhniebergall pushed a commit to maxhniebergall/elasticsearch that referenced this pull request Dec 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL :Analytics/Geo Indexing, search aggregations of geo points and shapes backport pending >feature Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.18.0 v9.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support ST_Envelope in ESQL
5 participants