[Meta] Percolate Query enhancements: Allow matching documents with subset of queries in Percolation Index #2818

sbcd90 · 2022-04-08T06:53:53Z

Problem

Percolate Query is a mechanism in OpenSearch which allows documents to be classified or tagged based on queries stored in an index. Percolate queries are especially beneficial in scenarios where a notification mechanism is required whenever a new document has been added to an index which matches user specified requirements. In OpenSearch, the Percolate query feature is loaded as a module.

Queries are stored in a special field of type percolator. This special field should be part of a new index which has the mapping of all fields from the original index in addition to the mapping of the percolator field.

Percolate query implementation allows storage of queries belonging to different user created indices together in a single index. Documents belonging to these user created indices can then be matched against all the queries stored in this single Percolation index. But this approach has a disadvantage too.

Each individual document is matched against all the queries stored in the associated percolation index. This can affect the performance of the percolate queries in above mentioned scenarios where documents need to be matched with only a subset of queries stored. Percolate queries do not support matching documents by query groups today.

Objective

This feature provides a mechanism which will allow a set of documents to be matched against a subset of queries stored in a percolation index.

This feature will allow the users to store queries belonging to different indices together in a single percolation index & match them with documents efficiently. This will subsequently reduce the maintenance effort of managing the lifecycle of the percolation indices created for every index to managing a single central percolation index.

Design

A POC was done with opensearch-project/alerting plugin to emulate such a scenario described in the above-mentioned problem statement.

The scenario considered for this POC is Document Level Alerting. Document Level Alerting can be defined in 2 steps. In the first step, documents created within a fixed time interval in a user created index are matched against a set of queries. Now, in the second step, these document-query pairs are matched against a trigger condition & if it is matched, an alert is triggered.

In this approach, as shown in the above diagram, there is only 1 central percolation index .opendistro-alerting-queries.

Every time the alerting plugin receives a create monitor or update monitor call, it first pulls the updated latest mapping of customer index & then runs a diff check on whether there is any difference in mapping between customer index & central percolation index.

Based on the diff checker results, fields are dynamically updated in the mapping of central percolator index & then queries are inserted.

The documents are then matched against all the queries stored in the central percolation index .opendistro-alerting-queries. This is where the feature of matching documents with subset of queries stored in the percolation index becomes really helpful because otherwise documents belonging to a particular index will be matched against all the queries from different monitors.

Few in-memory data structures from Lucene were evaluated for this purpose.

The exact design on how the objective will be achieved is still a work in progress.

Tracking Issues

Percolate Query implementation in Document Level Alerting

The text was updated successfully, but these errors were encountered:

getsaurabh02 · 2022-04-11T23:18:59Z

Thanks for providing the details @sbcd90. Few clarifications:

Percolate query implementation allows storage of queries belonging to different user created indices together in a single index.

So today the PercolateQueryBuilder offers the ability to provide the field (that contains the percolator query) and documents (binary blob containing document to percolate) to run the percolate search on the percolation_index which holds the query field. Are we proposing to have another layer of abstraction, such that all unrelated query documents could still be part of the same percolation_indexon disk, however, PercolateQueryBuilder allows a level of filtering, to filter out the related query documents (based on some field-value) in-memory first before running the percolate search.

It will be good if you can provide more detailed proposal on how this abstraction layer will look like. I assume it will be something like a PercolateManager which offers the capability to group the Set of percolate queries based on field-values, and optionally use them to match against the incoming documents.

sbcd90 added enhancement Enhancement or improvement to existing feature or request untriaged labels Apr 8, 2022

ryanbogan added Indexing & Search and removed untriaged labels Apr 12, 2022

xuezhou25 changed the title ~~Meta] Percolate Query enhancements: Allow matching documents with subset of queries in Percolation Index~~ [Meta] Percolate Query enhancements: Allow matching documents with subset of queries in Percolation Index Apr 12, 2022

sandeshkr419 mentioned this issue Nov 2, 2023

[BUG] Handling heap usage exceed error opensearch-project/security-analytics#711

Closed

andrross added the Search Search query, autocomplete ...etc label May 31, 2024

github-project-automation bot added this to Search Project Board May 31, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board May 31, 2024

andrross added Roadmap:Search Project-wide roadmap label and removed Indexing & Search labels May 31, 2024

Pallavi-AWS added this to OpenSearch Roadmap May 31, 2024

github-project-automation bot moved this to Planned work items in OpenSearch Roadmap May 31, 2024

getsaurabh02 moved this from 🆕 New to Later (6 months plus) in Search Project Board Aug 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Meta] Percolate Query enhancements: Allow matching documents with subset of queries in Percolation Index #2818

[Meta] Percolate Query enhancements: Allow matching documents with subset of queries in Percolation Index #2818

sbcd90 commented Apr 8, 2022

getsaurabh02 commented Apr 11, 2022

[Meta] Percolate Query enhancements: Allow matching documents with subset of queries in Percolation Index #2818

[Meta] Percolate Query enhancements: Allow matching documents with subset of queries in Percolation Index #2818

Comments

sbcd90 commented Apr 8, 2022

Problem

Objective

Design

Tracking Issues

getsaurabh02 commented Apr 11, 2022