[Meta] Percolate Query enhancements: Allow matching documents with subset of queries in Percolation Index #2818
Labels
enhancement
Enhancement or improvement to existing feature or request
Roadmap:Search
Project-wide roadmap label
Search
Search query, autocomplete ...etc
Problem
Percolate Query is a mechanism in OpenSearch which allows documents to be classified or tagged based on queries stored in an index. Percolate queries are especially beneficial in scenarios where a notification mechanism is required whenever a new document has been added to an index which matches user specified requirements. In OpenSearch, the Percolate query feature is loaded as a module.
Queries are stored in a special field of type
percolator
. This special field should be part of a new index which has the mapping of all fields from the original index in addition to the mapping of the percolator field.Percolate query implementation allows storage of queries belonging to different user created indices together in a single index. Documents belonging to these user created indices can then be matched against all the queries stored in this single Percolation index. But this approach has a disadvantage too.
Each individual document is matched against all the queries stored in the associated percolation index. This can affect the performance of the percolate queries in above mentioned scenarios where documents need to be matched with only a subset of queries stored. Percolate queries do not support matching documents by query groups today.
Objective
This feature provides a mechanism which will allow a set of documents to be matched against a subset of queries stored in a percolation index.
This feature will allow the users to store queries belonging to different indices together in a single percolation index & match them with documents efficiently. This will subsequently reduce the maintenance effort of managing the lifecycle of the percolation indices created for every index to managing a single central percolation index.
Design
A POC was done with
opensearch-project/alerting
plugin to emulate such a scenario described in the above-mentioned problem statement.The scenario considered for this POC is
Document Level Alerting
. Document Level Alerting can be defined in 2 steps. In the first step, documents created within a fixed time interval in a user created index are matched against a set of queries. Now, in the second step, these document-query pairs are matched against a trigger condition & if it is matched, an alert is triggered.In this approach, as shown in the above diagram, there is only 1 central percolation index
.opendistro-alerting-queries
.Every time the alerting plugin receives a create monitor or update monitor call, it first pulls the updated latest mapping of customer index & then runs a diff check on whether there is any difference in mapping between customer index & central percolation index.
Based on the diff checker results, fields are dynamically updated in the mapping of central percolator index & then queries are inserted.
The documents are then matched against all the queries stored in the central percolation index
.opendistro-alerting-queries
. This is where the feature of matching documents with subset of queries stored in the percolation index becomes really helpful because otherwise documents belonging to a particular index will be matched against all the queries from different monitors.Few in-memory data structures from Lucene were evaluated for this purpose.
The exact design on how the objective will be achieved is still a work in progress.
Tracking Issues
The text was updated successfully, but these errors were encountered: