Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL Support loading points from source into WKB blocks #103698

Merged
merged 50 commits into from
Jan 8, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
261bf04
ESQL reading points from source
craigtaverner Dec 1, 2023
eb47224
Initial support for BytesRefBlock to support WKB for Geometries
craigtaverner Dec 22, 2023
081468e
Update docs/changelog/103698.yaml
craigtaverner Dec 22, 2023
17ae512
Fixed changelog files
craigtaverner Dec 22, 2023
4cbac42
Remove support for PointBlock
craigtaverner Dec 28, 2023
3b0c89d
Merge remote-tracking branch 'origin/main' into esql_points_from_sour…
craigtaverner Dec 28, 2023
895cad1
One change from code review
craigtaverner Dec 28, 2023
60985fe
Removed some leftovers from the PointBlock removal
craigtaverner Dec 29, 2023
35c5df0
Some code-review and TODO checks
craigtaverner Dec 29, 2023
eeeffda
Some code-review and TODO checks, and deal with Plan serialization
craigtaverner Dec 29, 2023
0487203
Fixed failing tests with point WKT rendering
craigtaverner Dec 29, 2023
cfc4341
Revert stab at implementing forStats for doc-values vs source
craigtaverner Dec 29, 2023
1e6e744
Disabled failing test
craigtaverner Dec 29, 2023
8c5638e
Use max precision when serializing points to XContent
craigtaverner Jan 2, 2024
b96933b
Removed unused class from PointBlock code
craigtaverner Jan 2, 2024
db2094b
Merge remote-tracking branch 'origin/main' into esql_points_from_sour…
craigtaverner Jan 2, 2024
22f631f
Simplifications from code review
craigtaverner Jan 2, 2024
1b41302
Remove intermediate WKT from GeoPointFieldMapper
craigtaverner Jan 3, 2024
4d9eed2
Refactor WKBTopNEncoder to use length prefix encoding
craigtaverner Jan 3, 2024
e0c869d
Fix failing test
craigtaverner Jan 3, 2024
1ead4ce
Remove WKBTopNEncoder since DefaultUnsortableTopNEncoder can do the job
craigtaverner Jan 3, 2024
d699483
Cleanup from code review
craigtaverner Jan 3, 2024
bf1ab97
Remove concrete SpatialPoint class
craigtaverner Jan 4, 2024
c62e5aa
Bring back SpatialPointTests
craigtaverner Jan 4, 2024
935bfd6
Reduce object creation in some tests
craigtaverner Jan 4, 2024
5bcbaca
Removed unused code
craigtaverner Jan 4, 2024
3e89588
Removed unused code from earlier versions
craigtaverner Jan 4, 2024
07e162a
Tests were testing cases that no longer apply
craigtaverner Jan 4, 2024
cb436f5
Tests were testing cases that no longer apply
craigtaverner Jan 4, 2024
378614f
We need to consider WKB in PlanNamedTypes mixed-cluster
craigtaverner Jan 4, 2024
a72d0c2
Removed SpatialPoint from PlanNamedTypes
craigtaverner Jan 4, 2024
572e60f
Removed one more artifact of using SpatialPoint in tests
craigtaverner Jan 4, 2024
f1a3600
Some work towards removing creating SpatialPoints
craigtaverner Jan 4, 2024
0f1a58b
Merge remote-tracking branch 'origin/main' into esql_points_from_sour…
craigtaverner Jan 5, 2024
d0f6a12
Updated GeoPointFieldMapperTests after merge from main
craigtaverner Jan 5, 2024
8dffba7
Get row-stride-reader test working
craigtaverner Jan 5, 2024
d1d4ce6
Geo flaky tests to work more reliably
craigtaverner Jan 5, 2024
084d08e
Removed defensive coding in plan serialization
craigtaverner Jan 5, 2024
c176db1
Simplify and ensure error is thrown on wrong type
craigtaverner Jan 5, 2024
4509eab
Do version checks on reading plan from PlanStreamInput
craigtaverner Jan 5, 2024
97f5589
Merge remote-tracking branch 'origin/main' into esql_points_from_sour…
craigtaverner Jan 5, 2024
d18e3a9
Mute failing test
craigtaverner Jan 5, 2024
a830c79
Use specific TransportVersions for point literal in query plans
craigtaverner Jan 5, 2024
a11ebe0
Fixed failing test after merge with main
craigtaverner Jan 5, 2024
b69889e
Fixed failing test after merge with main, with nullValue from source
craigtaverner Jan 8, 2024
1927ec6
Merge remote-tracking branch 'origin/main' into esql_points_from_sour…
craigtaverner Jan 8, 2024
bb74171
Fixed compile error from previous fix with nullValues
craigtaverner Jan 8, 2024
6500d83
It seems mixed cluster yaml tests don't work
craigtaverner Jan 8, 2024
82d95b6
Merge branch 'main' into esql_points_from_source_wkb
elasticmachine Jan 8, 2024
3fd2b18
Merge remote-tracking branch 'origin/main' into esql_points_from_sour…
craigtaverner Jan 8, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions docs/changelog/103698.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
pr: 103698
summary: Reading points from source to reduce precision loss
area: ES|QL
type: enhancement
issues: []
82 changes: 24 additions & 58 deletions server/src/main/java/org/elasticsearch/common/geo/GeoPoint.java
Original file line number Diff line number Diff line change
Expand Up @@ -25,12 +25,11 @@
import java.io.IOException;
import java.util.Locale;

public final class GeoPoint implements SpatialPoint, ToXContentFragment {
public final class GeoPoint extends SpatialPoint implements ToXContentFragment {

private double lat;
private double lon;

public GeoPoint() {}
public GeoPoint() {
super(0, 0);
}

/**
* Create a new Geopoint from a string. This String must either be a geohash
Expand All @@ -39,31 +38,31 @@ public GeoPoint() {}
* @param value String to create the point from
*/
public GeoPoint(String value) {
this(); // initialize to defaults (since super-class does not have empty constructor)
this.resetFromString(value);
}

public GeoPoint(double lat, double lon) {
this.lat = lat;
this.lon = lon;
super(lon, lat);
}

public GeoPoint(SpatialPoint template) {
this(template.getY(), template.getX());
super(template);
}

public GeoPoint reset(double lat, double lon) {
this.lat = lat;
this.lon = lon;
this.y = lat;
this.x = lon;
return this;
}

public GeoPoint resetLat(double lat) {
this.lat = lat;
this.y = lat;
return this;
}

public GeoPoint resetLon(double lon) {
this.lon = lon;
this.x = lon;
return this;
}

Expand Down Expand Up @@ -134,8 +133,8 @@ GeoPoint parseGeoHash(String geohash, EffectivePoint effectivePoint) {
}

public GeoPoint resetFromIndexHash(long hash) {
lon = Geohash.decodeLongitude(hash);
lat = Geohash.decodeLatitude(hash);
this.x = Geohash.decodeLongitude(hash);
this.y = Geohash.decodeLatitude(hash);
return this;
}

Expand All @@ -155,43 +154,33 @@ public GeoPoint resetFromGeoHash(long geohashLong) {
}

public double lat() {
return this.lat;
return this.y;
}

public double getLat() {
return this.lat;
return this.y;
}

public double lon() {
return this.lon;
return this.x;
}

public double getLon() {
return this.lon;
}

@Override
public double getX() {
return this.lon;
}

@Override
public double getY() {
return this.lat;
return this.x;
}

public String geohash() {
return Geohash.stringEncode(lon, lat);
return Geohash.stringEncode(x, y);
}

public String getGeohash() {
return Geohash.stringEncode(lon, lat);
return Geohash.stringEncode(x, y);
}

/** Return the point in Lucene encoded format used to stored points as doc values */
public long getEncoded() {
final int latitudeEncoded = GeoEncodingUtils.encodeLatitude(this.lat);
final int longitudeEncoded = GeoEncodingUtils.encodeLongitude(this.lon);
final int latitudeEncoded = GeoEncodingUtils.encodeLatitude(this.y);
final int longitudeEncoded = GeoEncodingUtils.encodeLongitude(this.x);
return (((long) latitudeEncoded) << 32) | (longitudeEncoded & 0xFFFFFFFFL);
}

Expand All @@ -205,33 +194,10 @@ private GeoPoint resetFromEncoded(int encodedLat, int encodedLon) {
return reset(GeoEncodingUtils.decodeLatitude(encodedLat), GeoEncodingUtils.decodeLongitude(encodedLon));
}

@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;

GeoPoint geoPoint = (GeoPoint) o;

if (Double.compare(geoPoint.lat, lat) != 0) return false;
if (Double.compare(geoPoint.lon, lon) != 0) return false;

return true;
}

@Override
public int hashCode() {
int result;
long temp;
temp = lat != +0.0d ? Double.doubleToLongBits(lat) : 0L;
result = Long.hashCode(temp);
temp = lon != +0.0d ? Double.doubleToLongBits(lon) : 0L;
result = 31 * result + Long.hashCode(temp);
return result;
}

@Override
public String toString() {
return lat + ", " + lon;
// TODO: Should prefer to use super.toString() which uses WKT, but that has consequences
return y + ", " + x;
}

public static GeoPoint fromGeohash(String geohash) {
Expand All @@ -244,7 +210,7 @@ public static GeoPoint fromGeohash(long geohashLong) {

@Override
public XContentBuilder toXContent(XContentBuilder builder, Params params) throws IOException {
return builder.latlon(lat, lon);
return builder.latlon(y, x);
}

public static double assertZValue(final boolean ignoreZValue, double zValue) {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,65 @@
* To facilitate maximizing the use of common code between GeoPoint and projected CRS
* we introduced this ElasticPoint as an interface of commonality.
*/
public interface SpatialPoint {
double getX();
public class SpatialPoint implements Comparable<SpatialPoint> {

double getY();
protected double x;
protected double y;

public SpatialPoint(double x, double y) {
this.x = x;
this.y = y;
}

public SpatialPoint(SpatialPoint template) {
this(template.getX(), template.getY());
}

public final double getX() {
return x;
}

public final double getY() {
return y;
}

@Override
public int hashCode() {
return 31 * 31 * getClass().getSimpleName().hashCode() + 31 * Double.hashCode(x) + Double.hashCode(y);
}

@Override
public boolean equals(Object obj) {
if (this == obj) return true;
if (obj == null || getClass() != obj.getClass()) return false;
SpatialPoint point = (SpatialPoint) obj;
return (Double.compare(point.x, x) == 0) && Double.compare(point.y, y) == 0;
}

@Override
public String toString() {
return toWKT();
}

public String toWKT() {
// Code designed to mimic WellKnownText.toWKT, with much less stack depth and object creation
return "POINT (" + x + " " + y + ")";
}

@Override
public int compareTo(SpatialPoint other) {
if (this.getClass().equals(other.getClass())) {
double xd = this.getX() - other.getX();
double yd = this.getY() - other.getY();
return (xd == 0) ? comparison(yd) : comparison(xd);
} else {
// TODO: Rather separate based on CRS, but since we don't have that yet, we use class name
// The sort order here is unimportant and does not (yet) introduce BWC issues, so we are free to change it later with CRS
return this.getClass().getSimpleName().compareTo(other.getClass().getSimpleName());
}
}

private int comparison(double delta) {
return delta == 0 ? 0 : delta < 0 ? -1 : 1;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -87,16 +87,19 @@ private static XContentParser wrapObject(Object sourceMap) throws IOException {
public abstract static class AbstractGeometryFieldType<T> extends MappedFieldType {

protected final Parser<T> geometryParser;
private final T nullValue;

protected AbstractGeometryFieldType(
String name,
boolean indexed,
boolean stored,
boolean hasDocValues,
Parser<T> geometryParser,
T nullValue,
Map<String, String> meta
) {
super(name, indexed, stored, hasDocValues, TextSearchInfo.NONE, meta);
this.nullValue = nullValue;
this.geometryParser = geometryParser;
}

Expand Down Expand Up @@ -136,6 +139,14 @@ protected Object parseSourceValue(Object value) {
}
};
}

@Override
public BlockLoader blockLoader(BlockLoaderContext blContext) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to AbstractPointGeometryFieldMapper and remove the override method in AbstractShapeGeometryFieldMapper?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no intermediate class common between Geo and Cartesian points, so that would require quite a bit of restructuring. Besides, the plan is to support geo_shape very soon, so we would just move back to this then anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought AbstractPointGeometryFieldMapper is common for geo and cartesian points. I know it will change but we should keep PRs to the minimum please or it makes reviewing very difficult.

// Currently we can only load from source in ESQL
return new BlockSourceReader.GeometriesBlockLoader(
valueFetcher(blContext.sourcePaths(name()), nullValue, GeometryFormatterFactory.WKT)
iverase marked this conversation as resolved.
Show resolved Hide resolved
);
}
}

private final Explicit<Boolean> ignoreMalformed;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,13 +55,19 @@ protected AbstractShapeGeometryFieldType(
Orientation orientation,
Map<String, String> meta
) {
super(name, isSearchable, isStored, hasDocValues, parser, meta);
super(name, isSearchable, isStored, hasDocValues, parser, null, meta);
this.orientation = orientation;
}

public Orientation orientation() {
return this.orientation;
}

@Override
public BlockLoader blockLoader(BlockLoaderContext blContext) {
// TODO: Support shapes in ESQL
return null;
}
}

protected Explicit<Boolean> coerce;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import org.apache.lucene.index.SortedDocValues;
import org.apache.lucene.index.SortedSetDocValues;
import org.apache.lucene.util.BytesRef;
import org.elasticsearch.common.geo.SpatialPoint;
import org.elasticsearch.core.Releasable;
import org.elasticsearch.search.fetch.StoredFieldsSpec;
import org.elasticsearch.search.lookup.Source;
Expand Down Expand Up @@ -348,6 +349,11 @@ interface BlockFactory {
*/
BytesRefBuilder bytesRefs(int expectedCount);

/**
* Build a builder to load {@link SpatialPoint}s backed by WKB in BytesRefBlock.
*/
BytesRefBuilder geometries(int expectedCount);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we remove this and use bytesRefs? If so, maybe we could reuse BlockSourceLoader.BytesRefsBlockLoader like Ignacio was saying. It seems more likely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've removed that now, but could not remove GeometriesBlockLoader because it uses Geometries which does:

((BlockLoader.BytesRefBuilder) builder).appendBytesRef(new BytesRef(wkb));

While the equivalent one in the BytesRefsBlockLoader does:

((BlockLoader.BytesRefBuilder) builder).appendBytesRef(toBytesRef(scratch, (String) v));

So the BytesRefs one assuming incoming data is a String object, while the Geometries one assumes it is a byte[]. The difference seems quite important to me.


/**
* Build a builder to load doubles as loaded from doc values.
* Doc values load doubles deduplicated and in sorted order.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,17 @@
import org.apache.lucene.index.SortedSetDocValues;
import org.apache.lucene.util.BytesRef;
import org.apache.lucene.util.UnicodeUtil;
import org.elasticsearch.common.geo.SpatialPoint;
import org.elasticsearch.geometry.Geometry;
import org.elasticsearch.geometry.Point;
import org.elasticsearch.geometry.utils.GeometryValidator;
import org.elasticsearch.geometry.utils.WellKnownBinary;
import org.elasticsearch.geometry.utils.WellKnownText;
import org.elasticsearch.search.fetch.StoredFieldsSpec;

import java.io.IOException;
import java.nio.ByteOrder;
import java.text.ParseException;
import java.util.ArrayList;
import java.util.List;

Expand Down Expand Up @@ -131,6 +139,24 @@ public RowStrideReader rowStrideReader(LeafReaderContext context) {
}
}

public static class GeometriesBlockLoader extends SourceBlockLoader {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes me feel uneasy, cannot we reuse the BytesRef reader as we are reading an array of bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now we need this to get to the code in the 'Geometries' BlockSourceReader, but as that can get simplified, perhaps this can too.

private final ValueFetcher fetcher;

public GeometriesBlockLoader(ValueFetcher fetcher) {
this.fetcher = fetcher;
}

@Override
public Builder builder(BlockFactory factory, int expectedCount) {
return factory.geometries(expectedCount);
}

@Override
public RowStrideReader rowStrideReader(LeafReaderContext context) {
return new Geometries(fetcher);
}
}

private static class BytesRefs extends BlockSourceReader {
BytesRef scratch = new BytesRef();

Expand All @@ -149,6 +175,41 @@ public String toString() {
}
}

private static class Geometries extends BlockSourceReader {

Geometries(ValueFetcher fetcher) {
super(fetcher);
}

@Override
protected void append(BlockLoader.Builder builder, Object v) {
if (v instanceof SpatialPoint point) {
BytesRef wkb = new BytesRef(WellKnownBinary.toWKB(new Point(point.getX(), point.getY()), ByteOrder.LITTLE_ENDIAN));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very confused here, do we really generate points now? Do we really need this? the idea is that we only load WKB?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was defensive coding, for the variety of possible loading we will see going forward. The source value fetcher was generating WKT (but I can change that to WKB), doc-values was generating longs, and stored fields would generate points themselves (I assumed, but have not written support for that yet).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets remove defensive coding, I think we should only accept WKB here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ on keeping this list as small as possible. Accepting Object for append makes this whole difficult to understand, but that's how reading _source works at the moment and we ain't changing it here. So folks reading this have the easiest time I think we should only handle the actual things we expect to get. Is that just BytesRef, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed support for SpatialPoint

((BlockLoader.BytesRefBuilder) builder).appendBytesRef(wkb);
} else if (v instanceof String wkt) {
try {
iverase marked this conversation as resolved.
Show resolved Hide resolved
// TODO: figure out why this is not already happening in the GeoPointFieldMapper
Geometry geometry = WellKnownText.fromWKT(GeometryValidator.NOOP, false, wkt);
if (geometry instanceof Point point) {
BytesRef wkb = new BytesRef(WellKnownBinary.toWKB(point, ByteOrder.LITTLE_ENDIAN));
((BlockLoader.BytesRefBuilder) builder).appendBytesRef(wkb);
} else {
throw new IllegalArgumentException("Cannot convert geometry into point:: " + geometry.type());
}
} catch (IOException | ParseException e) {
throw new IllegalArgumentException("Failed to parse point geometry: " + e.getMessage(), e);
}
} else {
throw new IllegalArgumentException("Unsupported source type for point: " + v.getClass().getSimpleName());
}
}

@Override
public String toString() {
return "BlockSourceReader.Geometries";
}
}

public static class DoublesBlockLoader extends SourceBlockLoader {
private final ValueFetcher fetcher;

Expand Down
Loading