Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] PPL LOOKUP Functionality #2651

Open
brijos opened this issue May 3, 2024 · 4 comments
Open

[FEATURE] PPL LOOKUP Functionality #2651

brijos opened this issue May 3, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@brijos
Copy link

brijos commented May 3, 2024

Is your feature request related to a problem?
OpenSearch users want an easy way to enrich the data they have stored in OpenSearch and external data sources using content from an OpenSearch index. This is common in security analytics scenarios where one wants to enrich their IP reputation lists, vulnerability databases, or threat feeds.

What solution would you like?
Do a lookup of a field/value, from another log group and use that to convert to user friendly name/error code.

  • The lookup feature should support the OpenSearch indexes as data sources for lookup tables
  • Users should be able to perform a static lookup using the OpenSearch index and external data sources based on the Spark integration
  • Users should be able to perform a static lookup using a user generated CSV such as an org unit mapping or GeoIP from MaxMind and OpenSearch index
  • Users should be able to perform a static lookup using a user generated CSV such as an org unit mapping or GeoIP from MaxMind and an external data source
  • Admins can use Index State Management to control how long the reference lookup index is available
  • Include helpful error messages

*** Out of Scope ***

  • Defining new lookup data sources beyond listed types
  • Automatic field mapping between events and lookups

What alternatives have you considered?
Performing joins using SQL

Do you have any additional context?
None.

@brijos brijos added enhancement New feature or request untriaged labels May 3, 2024
@dblock
Copy link
Member

dblock commented Jun 24, 2024

Catch All Triage - 1 2 3 4 5 6

@YANG-DB
Copy link
Member

YANG-DB commented Jul 2, 2024

@brijos please take a look at this existing PPL correlation API

@salyh
Copy link

salyh commented Jul 16, 2024

PPL Lookup Design Proposal

As implemented in PR 2698 the proposed design (and so far implemented) syntax is:

Design

The lookup command can be implemented as a simple search for documents in the lookup index. For every row in a search result, a search with the given match fields are performed. If a single document is found, the fields and values of the lookup document are copied to the current row of the search result. If no document is found a no-op is performed. If multiple documents are found, an error is thrown. The implementation is mainly done in core/src/main/java/org/opensearch/sql/analysis/Analyzer.java and opensearch/src/main/java/org/opensearch/sql/opensearch/storage/OpenSearchIndex.java. The "lookup search" is performed as a term and a match query. So both cases are catched: When a field is not analyzed or analyzed with respect to it mapping.

The Spark PPL Lookup command implementation is done in separate PR in the opensearch-spark repo: PR 407. Here we can (and need) to implement it as a join.

Syntax

lookup <lookup index> <lookup field> [AS <local lookup field>] [<lookup field> [AS <local lookup field>]]… [appendonly=true|false] [<source field> [AS <local field>]]...

lookup is the name of the lookup operation and it is supposed to be changed to something else. Normally we would use lookup but this seems to already used otherwise in the AST.

<lookup index> is the name of the lookup index (mandatory).

Then we need at least one <lookup field> which is a field in the lookup index used to match to a local field (in the current search) to get the lookup document. When there is no lookup document we just do nothing, if there is more than one we fail with an error.

If more than one <lookup field> is provided, all of them must match (we do a term and a match query for the field value as of now)

If the field has a different name in the current search result use <local lookup field> to map it.

appendonly is false by default abnd inidicates if the values we copy over to the search result from the lookup documemnt should overwrite existing values. If appendonly is true we do not overwrite existing values.

<source field> are the fields that should be copied. If no such fields are given all fields are copied. If the field should have a different name than in the lookup document use [AS <local field>

Examples:

{"query":"source=logins | lookup users uid AS id appendonly=true"}
{"query":"source=logins | lookup users uid,name phone,department AS thedepartment"}

@Gokul-Radhakrishnan
Copy link

+1 for this feature

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants