Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Search Pipelines] Add request-scoped state shared between processors #9405

Merged
merged 10 commits into from
Dec 5, 2023

Conversation

msfroh
Copy link
Collaborator

@msfroh msfroh commented Aug 16, 2023

To handle cases where multiple search pipeline processors need to share information, we will allocate a Map<String, Object> for the lifetime of the request and pass it to each processor to get/set values.

Description

To handle cases where multiple search pipeline processors need to share information, we will allocate a Map<String, Object> for the lifetime of the request and pass it to each processor to get/set values.

Related Issues

#6722

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change f3957ea

Incompatible components

Incompatible components: [https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/asynchronous-search.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/reporting.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      1 org.opensearch.repositories.azure.AzureBlobContainerRetriesTests.testWriteLargeBlob

@codecov
Copy link

codecov bot commented Aug 17, 2023

Codecov Report

Attention: 53 lines in your changes are missing coverage. Please review.

Comparison is base (f7f3500) 71.37% compared to head (9ce4ca8) 71.28%.

Files Patch % Lines
...rch/pipeline/common/CollapseResponseProcessor.java 71.42% 7 Missing and 5 partials ⚠️
...ch/pipeline/common/helpers/SearchResponseUtil.java 69.69% 8 Missing and 2 partials ⚠️
...earch/search/pipeline/common/SearchRequestMap.java 80.00% 6 Missing and 3 partials ⚠️
...search/pipeline/common/ScriptRequestProcessor.java 40.00% 6 Missing ⚠️
...ch/pipeline/common/OversampleRequestProcessor.java 73.68% 3 Missing and 2 partials ⚠️
...pipeline/common/TruncateHitsResponseProcessor.java 80.76% 3 Missing and 2 partials ⚠️
...rg/opensearch/search/pipeline/common/BasicMap.java 90.90% 2 Missing ⚠️
...eline/common/SearchPipelineCommonModulePlugin.java 0.00% 1 Missing ⚠️
...h/search/pipeline/common/helpers/ContextUtils.java 75.00% 0 Missing and 1 partial ⚠️
...earch/pipeline/StatefulSearchRequestProcessor.java 0.00% 1 Missing ⚠️
... and 1 more
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #9405      +/-   ##
============================================
- Coverage     71.37%   71.28%   -0.09%     
+ Complexity    59113    59030      -83     
============================================
  Files          4893     4903      +10     
  Lines        277831   277979     +148     
  Branches      40367    40382      +15     
============================================
- Hits         198288   198146     -142     
- Misses        63042    63329     +287     
- Partials      16501    16504       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@austintlee
Copy link
Contributor

@msfroh This looks really good. Let me know if there's anything I can do to help get this over the line. I can help with writing unit tests, e.g.

@opensearch-trigger-bot
Copy link
Contributor

Compatibility status:

Checks if related components are compatible with change 60d272b

Incompatible components

Incompatible components: [https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/security-analytics.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/alerting.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/performance-analyzer-rca.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git]

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

@github-actions
Copy link
Contributor

Gradle Check (Jenkins) Run Completed with:

  • RESULT: UNSTABLE ❕
  • TEST FAILURES:
      3 org.opensearch.search.SearchTimeoutIT.testSimpleTimeout {p0={"search.concurrent_segment_search.enabled":"true"}}
      1 org.opensearch.search.SearchTimeoutIT.testSimpleTimeout {p0={"search.concurrent_segment_search.enabled":"false"}}
      1 org.opensearch.http.SearchRestCancellationIT.testAutomaticCancellationMultiSearchDuringFetchPhase

Copy link
Contributor

github-actions bot commented Dec 5, 2023

❌ Gradle check result for 7b049b0: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

To handle cases where multiple search pipeline processors need to share
information, we will allocate a Map<String, Object> for the lifetime of
the request and pass it to each processor to get/set values.

Signed-off-by: Michael Froh <froh@amazon.com>
Added "context_prefix" convention to scope variables to avoid
collisions.

Let script processor have access to the request context.

Added more unit tests.

Signed-off-by: Michael Froh <froh@amazon.com>
After realizing that we just need to keep the first hit for each group
(since results are already sorted by the sort criteria), I think
CollapseResponseProcessor might be worth including.

Combining it with the oversample + truncate processors, it can provide a
workaround for the lack of support for collapse + rescore.

Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
* Use default size in OversampleRequestProcessor if not specified.
* Pass context to SearchPhaseResultsProcessor too.
* Wrap context in its own class, in case we decide to add other fields
  to it in future.

Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Michael Froh <froh@amazon.com>
Copy link
Contributor

github-actions bot commented Dec 5, 2023

❕ Gradle check result for 9ce4ca8: UNSTABLE

  • TEST FAILURES:
      1 org.opensearch.search.SearchWeightedRoutingIT.testMultiGetWithNetworkDisruption_FailOpenEnabled

Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

@msfroh msfroh merged commit c204585 into opensearch-project:main Dec 5, 2023
29 checks passed
@reta reta added backport 2.x Backport to 2.x branch v3.0.0 Issues and PRs related to version 3.0.0 labels Dec 5, 2023
opensearch-trigger-bot bot pushed a commit that referenced this pull request Dec 5, 2023
…#9405)

To handle cases where multiple search pipeline processors need to share
information, we will allocate a context holder for the lifetime of
the request and pass it to each processor to get/set values.

To explain this behavior and benefit from it, this change also introduces
three new processors:
1. The "oversample" request processor that increases "size", storing the
original size in the context.
2. The "truncate" response processor that discards results after some
number, by default using the original size before oversampling.
3. The "collapse" response processor offers similar behavior to a collapse
query, discarding results that have a field value in common with a
higher-scoring result.

Signed-off-by: Michael Froh <froh@amazon.com>
(cherry picked from commit c204585)
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
@austintlee
Copy link
Contributor

Awesome!

@navneet1v
Copy link
Contributor

This is awesome. an awaited feature. :D

reta pushed a commit that referenced this pull request Dec 5, 2023
…#9405) (#11473)

To handle cases where multiple search pipeline processors need to share
information, we will allocate a context holder for the lifetime of
the request and pass it to each processor to get/set values.

To explain this behavior and benefit from it, this change also introduces
three new processors:
1. The "oversample" request processor that increases "size", storing the
original size in the context.
2. The "truncate" response processor that discards results after some
number, by default using the original size before oversampling.
3. The "collapse" response processor offers similar behavior to a collapse
query, discarding results that have a field value in common with a
higher-scoring result.


(cherry picked from commit c204585)

Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
gaobinlong pushed a commit to gaobinlong/OpenSearch that referenced this pull request Dec 7, 2023
…opensearch-project#9405)

To handle cases where multiple search pipeline processors need to share
information, we will allocate a context holder for the lifetime of
the request and pass it to each processor to get/set values.

To explain this behavior and benefit from it, this change also introduces
three new processors:
1. The "oversample" request processor that increases "size", storing the
original size in the context.
2. The "truncate" response processor that discards results after some
number, by default using the original size before oversampling.
3. The "collapse" response processor offers similar behavior to a collapse
query, discarding results that have a field value in common with a 
higher-scoring result.

Signed-off-by: Michael Froh <froh@amazon.com>
deshsidd pushed a commit to deshsidd/OpenSearch that referenced this pull request Dec 11, 2023
…opensearch-project#9405)

To handle cases where multiple search pipeline processors need to share
information, we will allocate a context holder for the lifetime of
the request and pass it to each processor to get/set values.

To explain this behavior and benefit from it, this change also introduces
three new processors:
1. The "oversample" request processor that increases "size", storing the
original size in the context.
2. The "truncate" response processor that discards results after some
number, by default using the original size before oversampling.
3. The "collapse" response processor offers similar behavior to a collapse
query, discarding results that have a field value in common with a 
higher-scoring result.

Signed-off-by: Michael Froh <froh@amazon.com>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
…opensearch-project#9405)

To handle cases where multiple search pipeline processors need to share
information, we will allocate a context holder for the lifetime of
the request and pass it to each processor to get/set values.

To explain this behavior and benefit from it, this change also introduces
three new processors:
1. The "oversample" request processor that increases "size", storing the
original size in the context.
2. The "truncate" response processor that discards results after some
number, by default using the original size before oversampling.
3. The "collapse" response processor offers similar behavior to a collapse
query, discarding results that have a field value in common with a 
higher-scoring result.

Signed-off-by: Michael Froh <froh@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
…opensearch-project#9405)

To handle cases where multiple search pipeline processors need to share
information, we will allocate a context holder for the lifetime of
the request and pass it to each processor to get/set values.

To explain this behavior and benefit from it, this change also introduces
three new processors:
1. The "oversample" request processor that increases "size", storing the
original size in the context.
2. The "truncate" response processor that discards results after some
number, by default using the original size before oversampling.
3. The "collapse" response processor offers similar behavior to a collapse
query, discarding results that have a field value in common with a
higher-scoring result.

Signed-off-by: Michael Froh <froh@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch v2.12.0 Issues and PRs related to version 2.12.0 v3.0.0 Issues and PRs related to version 3.0.0
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

8 participants