Skip to content

Commit

Permalink
[Geo] Integrate Lucene's LatLonShape (BKD Backed GeoShapes) as defaul…
Browse files Browse the repository at this point in the history
…t `geo_shape` indexing approach (#35320)

This commit  exposes lucene's LatLonShape field as the
default type in GeoShapeFieldMapper. To use the new 
indexing approach, simply set "type" : "geo_shape" in 
the mappings without setting any of the strategy, precision, 
tree_levels, or distance_error_pct parameters. Note the 
following when using the new indexing approach:

* geo_shape query does not support querying by 
MULTIPOINT.
* LINESTRING and MULTILINESTRING queries do not 
yet support WITHIN relation.
* CONTAINS relation is not yet supported.
The tree, precision, tree_levels, distance_error_pct, 
and points_only parameters are deprecated.
  • Loading branch information
nknize authored Dec 17, 2018
1 parent f1e1f93 commit 5bc7822
Show file tree
Hide file tree
Showing 31 changed files with 2,635 additions and 1,233 deletions.
186 changes: 110 additions & 76 deletions docs/reference/mapping/types/geo-shape.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -21,48 +21,59 @@ type.
|=======================================================================
|Option |Description| Default

|`tree` |Name of the PrefixTree implementation to be used: `geohash` for
GeohashPrefixTree and `quadtree` for QuadPrefixTree.
| `geohash`

|`precision` |This parameter may be used instead of `tree_levels` to set
an appropriate value for the `tree_levels` parameter. The value
specifies the desired precision and Elasticsearch will calculate the
best tree_levels value to honor this precision. The value should be a
number followed by an optional distance unit. Valid distance units
include: `in`, `inch`, `yd`, `yard`, `mi`, `miles`, `km`, `kilometers`,
`m`,`meters`, `cm`,`centimeters`, `mm`, `millimeters`.
|`tree |deprecated[6.6, PrefixTrees no longer used] Name of the PrefixTree
implementation to be used: `geohash` for GeohashPrefixTree and `quadtree`
for QuadPrefixTree. Note: This parameter is only relevant for `term` and
`recursive` strategies.
| `quadtree`

|`precision` |deprecated[6.6, PrefixTrees no longer used] This parameter may
be used instead of `tree_levels` to set an appropriate value for the
`tree_levels` parameter. The value specifies the desired precision and
Elasticsearch will calculate the best tree_levels value to honor this
precision. The value should be a number followed by an optional distance
unit. Valid distance units include: `in`, `inch`, `yd`, `yard`, `mi`,
`miles`, `km`, `kilometers`, `m`,`meters`, `cm`,`centimeters`, `mm`,
`millimeters`. Note: This parameter is only relevant for `term` and
`recursive` strategies.
| `50m`

|`tree_levels` |Maximum number of layers to be used by the PrefixTree.
This can be used to control the precision of shape representations and
therefore how many terms are indexed. Defaults to the default value of
the chosen PrefixTree implementation. Since this parameter requires a
certain level of understanding of the underlying implementation, users
may use the `precision` parameter instead. However, Elasticsearch only
uses the tree_levels parameter internally and this is what is returned
via the mapping API even if you use the precision parameter.
|`tree_levels` |deprecated[6.6, PrefixTrees no longer used] Maximum number
of layers to be used by the PrefixTree. This can be used to control the
precision of shape representations andtherefore how many terms are
indexed. Defaults to the default value of the chosen PrefixTree
implementation. Since this parameter requires a certain level of
understanding of the underlying implementation, users may use the
`precision` parameter instead. However, Elasticsearch only uses the
tree_levels parameter internally and this is what is returned via the
mapping API even if you use the precision parameter. Note: This parameter
is only relevant for `term` and `recursive` strategies.
| various

|`strategy` |The strategy parameter defines the approach for how to
represent shapes at indexing and search time. It also influences the
capabilities available so it is recommended to let Elasticsearch set
this parameter automatically. There are two strategies available:
`recursive` and `term`. Term strategy supports point types only (the
`points_only` parameter will be automatically set to true) while
Recursive strategy supports all shape types. (IMPORTANT: see
<<prefix-trees, Prefix trees>> for more detailed information)
|`strategy` |deprecated[6.6, PrefixTrees no longer used] The strategy
parameter defines the approach for how to represent shapes at indexing
and search time. It also influences the capabilities available so it
is recommended to let Elasticsearch set this parameter automatically.
There are two strategies available: `recursive`, and `term`.
Recursive and Term strategies are deprecated and will be removed in a
future version. While they are still available, the Term strategy
supports point types only (the `points_only` parameter will be
automatically set to true) while Recursive strategy supports all
shape types. (IMPORTANT: see <<prefix-trees, Prefix trees>> for more
detailed information about these strategies)
| `recursive`

|`distance_error_pct` |Used as a hint to the PrefixTree about how
precise it should be. Defaults to 0.025 (2.5%) with 0.5 as the maximum
supported value. PERFORMANCE NOTE: This value will default to 0 if a `precision` or
`tree_level` definition is explicitly defined. This guarantees spatial precision
at the level defined in the mapping. This can lead to significant memory usage
for high resolution shapes with low error (e.g., large shapes at 1m with < 0.001 error).
To improve indexing performance (at the cost of query accuracy) explicitly define
`tree_level` or `precision` along with a reasonable `distance_error_pct`, noting
that large shapes will have greater false positives.
|`distance_error_pct` |deprecated[6.6, PrefixTrees no longer used] Used as a
hint to the PrefixTree about how precise it should be. Defaults to 0.025 (2.5%)
with 0.5 as the maximum supported value. PERFORMANCE NOTE: This value will
default to 0 if a `precision` or `tree_level` definition is explicitly defined.
This guarantees spatial precision at the level defined in the mapping. This can
lead to significant memory usage for high resolution shapes with low error
(e.g., large shapes at 1m with < 0.001 error). To improve indexing performance
(at the cost of query accuracy) explicitly define `tree_level` or `precision`
along with a reasonable `distance_error_pct`, noting that large shapes will have
greater false positives. Note: This parameter is only relevant for `term` and
`recursive` strategies.
| `0.025`

|`orientation` |Optionally define how to interpret vertex order for
Expand All @@ -77,13 +88,13 @@ sets vertex order for the coordinate list of a geo_shape field but can be
overridden in each individual GeoJSON or WKT document.
| `ccw`

|`points_only` |Setting this option to `true` (defaults to `false`) configures
the `geo_shape` field type for point shapes only (NOTE: Multi-Points are not
yet supported). This optimizes index and search performance for the `geohash` and
`quadtree` when it is known that only points will be indexed. At present geo_shape
queries can not be executed on `geo_point` field types. This option bridges the gap
by improving point performance on a `geo_shape` field so that `geo_shape` queries are
optimal on a point only field.
|`points_only` |deprecated[6.6, PrefixTrees no longer used] Setting this option to
`true` (defaults to `false`) configures the `geo_shape` field type for point
shapes only (NOTE: Multi-Points are not yet supported). This optimizes index and
search performance for the `geohash` and `quadtree` when it is known that only points
will be indexed. At present geo_shape queries can not be executed on `geo_point`
field types. This option bridges the gap by improving point performance on a
`geo_shape` field so that `geo_shape` queries are optimal on a point only field.
| `false`

|`ignore_malformed` |If true, malformed GeoJSON or WKT shapes are ignored. If
Expand All @@ -100,16 +111,35 @@ and reject the whole document.

|=======================================================================


[[geoshape-indexing-approach]]
[float]
==== Indexing approach
GeoShape types are indexed by decomposing the shape into a triangular mesh and
indexing each triangle as a 7 dimension point in a BKD tree. This provides
near perfect spatial resolution (down to 1e-7 decimal degree precision) since all
spatial relations are computed using an encoded vector representation of the
original shape instead of a raster-grid representation as used by the
<<prefix-trees>> indexing approach. Performance of the tessellator primarily
depends on the number of vertices that define the polygon/multi-polyogn. While
this is the default indexing technique prefix trees can still be used by setting
the `tree` or `strategy` parameters according to the appropriate
<<geo-shape-mapping-options>>. Note that these parameters are now deprecated
and will be removed in a future version.

[[prefix-trees]]
[float]
==== Prefix trees

To efficiently represent shapes in the index, Shapes are converted into
a series of hashes representing grid squares (commonly referred to as "rasters")
using implementations of a PrefixTree. The tree notion comes from the fact that
the PrefixTree uses multiple grid layers, each with an increasing level of
precision to represent the Earth. This can be thought of as increasing the level
of detail of a map or image at higher zoom levels.
deprecated[6.6, PrefixTrees no longer used] To efficiently represent shapes in
an inverted index, Shapes are converted into a series of hashes representing
grid squares (commonly referred to as "rasters") using implementations of a
PrefixTree. The tree notion comes from the fact that the PrefixTree uses multiple
grid layers, each with an increasing level of precision to represent the Earth.
This can be thought of as increasing the level of detail of a map or image at higher
zoom levels. Since this approach causes precision issues with indexed shape, it has
been deprecated in favor of a vector indexing approach that indexes the shapes as a
triangular mesh (see <<geoshape-indexing-approach>>).

Multiple PrefixTree implementations are provided:

Expand All @@ -131,9 +161,10 @@ number of levels for the quad trees in Elasticsearch is 29; the default is 21.
[[spatial-strategy]]
[float]
===== Spatial strategies
The PrefixTree implementations rely on a SpatialStrategy for decomposing
the provided Shape(s) into approximated grid squares. Each strategy answers
the following:
deprecated[6.6, PrefixTrees no longer used] The indexing implementation
selected relies on a SpatialStrategy for choosing how to decompose the shapes
(either as grid squares or a tessellated triangular mesh). Each strategy
answers the following:

* What type of Shapes can be indexed?
* What types of Query Operations and Shapes can be used?
Expand All @@ -146,21 +177,21 @@ are provided:
|=======================================================================
|Strategy |Supported Shapes |Supported Queries |Multiple Shapes

|`recursive` |<<input-structure, All>> |`INTERSECTS`, `DISJOINT`, `WITHIN`, `CONTAINS` |Yes
|`recursive` |<<input-structure, All>> |`INTERSECTS`, `DISJOINT`, `WITHIN`, `CONTAINS` |Yes
|`term` |<<point, Points>> |`INTERSECTS` |Yes

|=======================================================================

[float]
===== Accuracy

Geo_shape does not provide 100% accuracy and depending on how it is configured
it may return some false positives for `INTERSECTS`, `WITHIN` and `CONTAINS`
queries, and some false negatives for `DISJOINT` queries. To mitigate this, it
is important to select an appropriate value for the tree_levels parameter and
to adjust expectations accordingly. For example, a point may be near the border
of a particular grid cell and may thus not match a query that only matches the
cell right next to it -- even though the shape is very close to the point.
`Recursive` and `Term` strategies do not provide 100% accuracy and depending on
how they are configured it may return some false positives for `INTERSECTS`,
`WITHIN` and `CONTAINS` queries, and some false negatives for `DISJOINT` queries.
To mitigate this, it is important to select an appropriate value for the tree_levels
parameter and to adjust expectations accordingly. For example, a point may be near
the border of a particular grid cell and may thus not match a query that only matches
the cell right next to it -- even though the shape is very close to the point.

[float]
===== Example
Expand All @@ -173,9 +204,7 @@ PUT /example
"doc": {
"properties": {
"location": {
"type": "geo_shape",
"tree": "quadtree",
"precision": "100m"
"type": "geo_shape"
}
}
}
Expand All @@ -185,22 +214,23 @@ PUT /example
// CONSOLE
// TESTSETUP

This mapping maps the location field to the geo_shape type using the
quad_tree implementation and a precision of 100m. Elasticsearch translates
this into a tree_levels setting of 20.
This mapping definition maps the location field to the geo_shape
type using the default vector implementation. It provides
approximately 1e-7 decimal degree precision.

[float]
===== Performance considerations
===== Performance considerations with Prefix Trees

Elasticsearch uses the paths in the prefix tree as terms in the index
and in queries. The higher the level is (and thus the precision), the
more terms are generated. Of course, calculating the terms, keeping them in
deprecated[6.6, PrefixTrees no longer used] With prefix trees,
Elasticsearch uses the paths in the tree as terms in the inverted index
and in queries. The higher the level (and thus the precision), the more
terms are generated. Of course, calculating the terms, keeping them in
memory, and storing them on disk all have a price. Especially with higher
tree levels, indices can become extremely large even with a modest
amount of data. Additionally, the size of the features also matters.
Big, complex polygons can take up a lot of space at higher tree levels.
Which setting is right depends on the use case. Generally one trades off
accuracy against index size and query performance.
tree levels, indices can become extremely large even with a modest amount
of data. Additionally, the size of the features also matters. Big, complex
polygons can take up a lot of space at higher tree levels. Which setting
is right depends on the use case. Generally one trades off accuracy against
index size and query performance.

The defaults in Elasticsearch for both implementations are a compromise
between index size and a reasonable level of precision of 50m at the
Expand Down Expand Up @@ -598,7 +628,10 @@ POST /example/doc
===== Circle

Elasticsearch supports a `circle` type, which consists of a center
point with a radius:
point with a radius. Note that this circle representation can only
be indexed when using the `recursive` Prefix Tree strategy. For
the default <<geoshape-indexing-approach>> circles should be approximated using
a `POLYGON`.

[source,js]
--------------------------------------------------
Expand All @@ -612,6 +645,7 @@ POST /example/doc
}
--------------------------------------------------
// CONSOLE
// TEST[skip:not supported in default]

Note: The inner `radius` field is required. If not specified, then
the units of the `radius` will default to `METERS`.
Expand Down
16 changes: 16 additions & 0 deletions docs/reference/migration/migrate_7_0/mappings.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -52,3 +52,19 @@ as a better alternative.

An error will now be thrown when unknown configuration options are provided
to similarities. Such unknown parameters were ignored before.

[float]
==== deprecated `geo_shape` Prefix Tree indexing

`geo_shape` types now default to using a vector indexing approach based on Lucene's new
`LatLonShape` field type. This indexes shapes as a triangular mesh instead of decomposing
them into individual grid cells. To index using legacy prefix trees `recursive` or `term`
strategy must be explicitly defined. Note that these strategies are now deprecated and will
be removed in a future version.

[float]
==== deprecated `geo_shape` parameters

The following type parameters are deprecated for the `geo_shape` field type: `tree`,
`precision`, `tree_levels`, `distance_error_pct`, `points_only`, and `strategy`. They
will be removed in a future version.
5 changes: 3 additions & 2 deletions docs/reference/query-dsl/geo-shape-query.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Requires the <<geo-shape,`geo_shape` Mapping>>.

The `geo_shape` query uses the same grid square representation as the
`geo_shape` mapping to find documents that have a shape that intersects
with the query shape. It will also use the same PrefixTree configuration
with the query shape. It will also use the same Prefix Tree configuration
as defined for the field mapping.

The query supports two ways of defining the query shape, either by
Expand Down Expand Up @@ -157,7 +157,8 @@ has nothing in common with the query geometry.
* `WITHIN` - Return all documents whose `geo_shape` field
is within the query geometry.
* `CONTAINS` - Return all documents whose `geo_shape` field
contains the query geometry.
contains the query geometry. Note: this is only supported using the
`recursive` Prefix Tree Strategy deprecated[6.6]

[float]
==== Ignore Unmapped
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@

package org.elasticsearch.common.geo;

import org.apache.lucene.document.LatLonShape.QueryRelation;
import org.elasticsearch.common.io.stream.StreamInput;
import org.elasticsearch.common.io.stream.StreamOutput;
import org.elasticsearch.common.io.stream.Writeable;
Expand Down Expand Up @@ -62,6 +63,17 @@ public static ShapeRelation getRelationByName(String name) {
return null;
}

/** Maps ShapeRelation to Lucene's LatLonShapeRelation */
public QueryRelation getLuceneRelation() {
switch (this) {
case INTERSECTS: return QueryRelation.INTERSECTS;
case DISJOINT: return QueryRelation.DISJOINT;
case WITHIN: return QueryRelation.WITHIN;
default:
throw new IllegalArgumentException("ShapeRelation [" + this + "] not supported");
}
}

public String getRelationName() {
return relationName;
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -197,9 +197,6 @@ public Object buildLucene() {
}
}

if (shapes.size() == 1) {
return shapes.get(0);
}
return shapes.toArray(new Object[shapes.size()]);
}

Expand Down
Loading

0 comments on commit 5bc7822

Please sign in to comment.