Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta] Percolate Query enhancements: Allow matching documents with subset of queries in Percolation Index #2818

Open
sbcd90 opened this issue Apr 8, 2022 · 1 comment
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Search Project-wide roadmap label Search Search query, autocomplete ...etc

Comments

@sbcd90
Copy link
Contributor

sbcd90 commented Apr 8, 2022

Problem

Percolate Query is a mechanism in OpenSearch which allows documents to be classified or tagged based on queries stored in an index. Percolate queries are especially beneficial in scenarios where a notification mechanism is required whenever a new document has been added to an index which matches user specified requirements. In OpenSearch, the Percolate query feature is loaded as a module.

Queries are stored in a special field of type percolator. This special field should be part of a new index which has the mapping of all fields from the original index in addition to the mapping of the percolator field.

Percolate query implementation allows storage of queries belonging to different user created indices together in a single index. Documents belonging to these user created indices can then be matched against all the queries stored in this single Percolation index. But this approach has a disadvantage too.

Each individual document is matched against all the queries stored in the associated percolation index. This can affect the performance of the percolate queries in above mentioned scenarios where documents need to be matched with only a subset of queries stored. Percolate queries do not support matching documents by query groups today.

Objective

This feature provides a mechanism which will allow a set of documents to be matched against a subset of queries stored in a percolation index.

This feature will allow the users to store queries belonging to different indices together in a single percolation index & match them with documents efficiently. This will subsequently reduce the maintenance effort of managing the lifecycle of the percolation indices created for every index to managing a single central percolation index.

Design

A POC was done with opensearch-project/alerting plugin to emulate such a scenario described in the above-mentioned problem statement.

The scenario considered for this POC is Document Level Alerting. Document Level Alerting can be defined in 2 steps. In the first step, documents created within a fixed time interval in a user created index are matched against a set of queries. Now, in the second step, these document-query pairs are matched against a trigger condition & if it is matched, an alert is triggered.

Screen Shot 2022-03-28 at 2 29 51 PM

In this approach, as shown in the above diagram, there is only 1 central percolation index .opendistro-alerting-queries.

Every time the alerting plugin receives a create monitor or update monitor call, it first pulls the updated latest mapping of customer index & then runs a diff check on whether there is any difference in mapping between customer index & central percolation index.

Based on the diff checker results, fields are dynamically updated in the mapping of central percolator index & then queries are inserted.

The documents are then matched against all the queries stored in the central percolation index .opendistro-alerting-queries. This is where the feature of matching documents with subset of queries stored in the percolation index becomes really helpful because otherwise documents belonging to a particular index will be matched against all the queries from different monitors.

Few in-memory data structures from Lucene were evaluated for this purpose.

The exact design on how the objective will be achieved is still a work in progress.

Tracking Issues

@sbcd90 sbcd90 added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 8, 2022
@getsaurabh02
Copy link
Member

Thanks for providing the details @sbcd90. Few clarifications:

Percolate query implementation allows storage of queries belonging to different user created indices together in a single index.

So today the PercolateQueryBuilder offers the ability to provide the field (that contains the percolator query) and documents (binary blob containing document to percolate) to run the percolate search on the percolation_index which holds the query field. Are we proposing to have another layer of abstraction, such that all unrelated query documents could still be part of the same percolation_indexon disk, however, PercolateQueryBuilder allows a level of filtering, to filter out the related query documents (based on some field-value) in-memory first before running the percolate search.

It will be good if you can provide more detailed proposal on how this abstraction layer will look like. I assume it will be something like a PercolateManager which offers the capability to group the Set of percolate queries based on field-values, and optionally use them to match against the incoming documents.

@xuezhou25 xuezhou25 changed the title Meta] Percolate Query enhancements: Allow matching documents with subset of queries in Percolation Index [Meta] Percolate Query enhancements: Allow matching documents with subset of queries in Percolation Index Apr 12, 2022
@andrross andrross added the Search Search query, autocomplete ...etc label May 31, 2024
@andrross andrross added Roadmap:Search Project-wide roadmap label and removed Indexing & Search labels May 31, 2024
@github-project-automation github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024
@getsaurabh02 getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhancement or improvement to existing feature or request Roadmap:Search Project-wide roadmap label Search Search query, autocomplete ...etc
Projects
Status: New
Status: Later (6 months plus)
Development

No branches or pull requests

4 participants