You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While Cloudant already offers a geospatial indexing module its implementation has one major drawback: It is not possible to combine geospatial indexing with fulltext indexing, i.e. either you query by geometry or you query by another criteria. Querying for both criteria at the same time is not possible.
Furthermore, the geospatial module is not open-sourced (yet?), i.e. it can only be used in the hosted Cloudant offer but not on premisses. While the abandoned branch 62936-geo-with-query indicates that there as been initial work to open-source the module its status is not clear.
Goal
The goal is to implement a minimal viable implementation featuring only the following basic functions:
Index fields of a document as LatLonPoint
Query field by distance (center and radius)
Query field by box
Query field by polygon
Combine one of the beforementioned queries with a standard Lucene query string
Implementation
After initial work as described in issue #4 in porting clouseau to Java 8 and Lucene 6.3.0 we now have all the necessary conditions to implement efficient spatial search using Lucene's LatLonPoint
and the corresponding functions
newBoxQuery()
newDistanceQuery()
newPolygonQuery()
nearest()
Two stages have to be distinguished: indexing and querying.
Indexing
Currently clouseau does not allow indexing of complex types like objects or arrays. A necessary step would be to allow arrays to be sent from dreyfus to clouseau. With some luck and thanks to scalang this involves little or tiny work in ClouseauTypeDecoder.
Once clouseau receives an array a very first implementation could only handle the case of an array with only two elements and index it as LatLonPoint, the first element being the latitude, the later the longitude.
Note that an array might also be indexed as a multi-dimensional DoublePoint. Eventually
this might become the default case with LatLonPoint being a special case and being forced by
an entry in the options map or vice versa.
In a third step complex types might be enabled as well, which would allow for GeoJSON to be send from dreyfus to clouseau. This however would involve further work and is certainly out of scope for this minimal proof of concept.
Note that most certainly most of this functionality can be implemented solely by modifications to clouseau.
Querying
In order to be able to query fields indexed as LatLonPoint another field has to be introduced in the querying endpoint of dreyfus. For the sake of simplicity I would call this field qdsl and let its syntax be inspired from the ElasticSearch Geo Queries syntax.
This reason to call it qdsl is that it might support the more generic Elasticsearch Query DSL at some point in the future.
As a simple mashup this could look like the following JSON.
Dreyfus would recognize the qdsl field and pass it on to clouseau. Clouseau would do the heavy lifting, parse the JSON, construct the corresponding Lucene Query and combine it by means of a BooleanQueryMUST with the query obtained from q.
Changes to dreyfus would be minimal while additions to clouseau are still moderate.
Conclusion
The necessary codechanges involved in bringing Lucene geospatial filtering to clouseau are moderate and rather straight forward. Different roads can be taken on how to index point fields and query the geospatial index, one is outlined in text above. Care was token to chose an approach with an expected minimal implementation effort while still providing most of the additional value introduced with LatLonPoint in Lucene 6.
This proposition should currently be seen as a draft and is open for comments and suggestions.
The text was updated successfully, but these errors were encountered:
dmunch
changed the title
Geospatial filtering based on Lucene 6
[Feature] Geospatial filtering based on Lucene 6
Dec 30, 2016
Motivation
Starting from version 6.0 Lucene features effecient geospatial indexing with k-d index trees.
https://www.elastic.co/blog/lucene-points-6.0
https://www.elastic.co/blog/apache-lucene-numeric-filters
While Cloudant already offers a geospatial indexing module its implementation has one major drawback: It is not possible to combine geospatial indexing with fulltext indexing, i.e. either you query by geometry or you query by another criteria. Querying for both criteria at the same time is not possible.
Furthermore, the geospatial module is not open-sourced (yet?), i.e. it can only be used in the hosted Cloudant offer but not on premisses. While the abandoned branch 62936-geo-with-query indicates that there as been initial work to open-source the module its status is not clear.
Goal
The goal is to implement a minimal viable implementation featuring only the following basic functions:
LatLonPoint
Implementation
After initial work as described in issue #4 in porting clouseau to Java 8 and Lucene 6.3.0 we now have all the necessary conditions to implement efficient spatial search using Lucene's
LatLonPoint
and the corresponding functions
newBoxQuery()
newDistanceQuery()
newPolygonQuery()
nearest()
Two stages have to be distinguished: indexing and querying.
Indexing
Currently clouseau does not allow indexing of complex types like objects or arrays. A necessary step would be to allow arrays to be sent from dreyfus to clouseau. With some luck and thanks to
scalang
this involves little or tiny work inClouseauTypeDecoder
.Once clouseau receives an array a very first implementation could only handle the case of an array with only two elements and index it as
LatLonPoint
, the first element being the latitude, the later the longitude.Note that an array might also be indexed as a multi-dimensional
DoublePoint
. Eventuallythis might become the default case with
LatLonPoint
being a special case and being forced byan entry in the options map or vice versa.
In a third step complex types might be enabled as well, which would allow for GeoJSON to be send from dreyfus to clouseau. This however would involve further work and is certainly out of scope for this minimal proof of concept.
Note that most certainly most of this functionality can be implemented solely by modifications to clouseau.
Querying
In order to be able to query fields indexed as
LatLonPoint
another field has to be introduced in the querying endpoint of dreyfus. For the sake of simplicity I would call this fieldqdsl
and let its syntax be inspired from the ElasticSearch Geo Queries syntax.This reason to call it
qdsl
is that it might support the more generic Elasticsearch Query DSL at some point in the future.As a simple mashup this could look like the following JSON.
Dreyfus would recognize the
qdsl
field and pass it on to clouseau. Clouseau would do the heavy lifting, parse the JSON, construct the corresponding LuceneQuery
and combine it by means of aBooleanQuery
MUST
with the query obtained fromq
.Changes to dreyfus would be minimal while additions to clouseau are still moderate.
Conclusion
The necessary codechanges involved in bringing Lucene geospatial filtering to clouseau are moderate and rather straight forward. Different roads can be taken on how to index point fields and query the geospatial index, one is outlined in text above. Care was token to chose an approach with an expected minimal implementation effort while still providing most of the additional value introduced with
LatLonPoint
in Lucene 6.This proposition should currently be seen as a draft and is open for comments and suggestions.
The text was updated successfully, but these errors were encountered: