Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix memory leak issue in ReorganizingLongHash #11953

Merged
merged 1 commit into from
Jan 19, 2024

Conversation

neetikasinghal
Copy link
Contributor

@neetikasinghal neetikasinghal commented Jan 19, 2024

Description

I am able to figure out the root-cause of the memory leak happening.

During the execution of the index search in the test, GlobalOrdinalsStringTermsAggregator is initialized that has an initialization of a collection strategy in its constructor.

The collection strategy initialization flow is as follows:

new RemapGlobalOrds() -> LongKeyedBucketOrds.build() -> new FromSingle(bigArrays) -> new ReorganizingLongHash(bigArrays) -> ReorganizingLongHash constructor

In the ReorganizingLongHash's constructor, there are two big arrays initialized whose memory is accounted by the Circuit Breaker here.

In the happy case scenario, the GlobalOrdinalsStringTermsAggregator is initialized which initializes the collectionStrategy and the arrays in ReorganizingLongHash's constructor are accounted by the CircuitBreaker. When a CircuitBreakingException is hit on any other code flow, the SearchContext.close() is called which further calls close on GlobalOrdinalsStringTermsAggregator and since the collectionStrategy is not null, close is called on ReorganizingLongHash's arrays as well, accounted by the CircuitBreaker and hence there is no memory leak.
However, when the CircuitBreakingException happens during the initialization of the keys array in ReorganizingLongHash's constructor, then the collection strategy is null and hence the ReorganizingLongHash's close is not called which leads to tables array in ReorganizingLongHash's constructor not getting closed, not accounted by the CircuitBreaker and hence leading to memory leak.

In order to deal with this, close needs to be explicitly called in ReorganizingLongHash's constructor when an exception is encountered. This is done as part of the PR #11953

Related Issues

Resolves #10154

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Failing checks are inspected and point to the corresponding known issue(s) (See: Troubleshooting Failing Builds)
  • Commits are signed per the DCO using --signoff
  • Commit changes are listed out in CHANGELOG.md file (See: Changelog)
  • Public documentation issue/PR created

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Contributor

github-actions bot commented Jan 19, 2024

Compatibility status:

Checks if related components are compatible with change 8d49908

Incompatible components

Incompatible components: [https://github.com/opensearch-project/observability.git, https://github.com/opensearch-project/custom-codecs.git, https://github.com/opensearch-project/cross-cluster-replication.git, https://github.com/opensearch-project/asynchronous-search.git, https://github.com/opensearch-project/reporting.git, https://github.com/opensearch-project/sql.git, https://github.com/opensearch-project/performance-analyzer.git, https://github.com/opensearch-project/performance-analyzer-rca.git]

Skipped components

Compatible components

Compatible components: [https://github.com/opensearch-project/security-analytics.git, https://github.com/opensearch-project/geospatial.git, https://github.com/opensearch-project/notifications.git, https://github.com/opensearch-project/opensearch-oci-object-storage.git, https://github.com/opensearch-project/job-scheduler.git, https://github.com/opensearch-project/neural-search.git, https://github.com/opensearch-project/security.git, https://github.com/opensearch-project/ml-commons.git, https://github.com/opensearch-project/common-utils.git, https://github.com/opensearch-project/anomaly-detection.git, https://github.com/opensearch-project/k-nn.git, https://github.com/opensearch-project/index-management.git, https://github.com/opensearch-project/alerting.git]

Signed-off-by: Neetika Singhal <neetiks@amazon.com>
Copy link
Contributor

❌ Gradle check result for bb99414: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 8d49908: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

Copy link
Contributor

❌ Gradle check result for 780b2d2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@neetikasinghal
Copy link
Contributor Author

❌ Gradle check result for 780b2d2: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

#10006 - flaky
#11849 - flaky

Copy link
Contributor

@deshsidd deshsidd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor

✅ Gradle check result for 8d49908: SUCCESS

Copy link

codecov bot commented Jan 19, 2024

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (e265355) 71.41% compared to head (8d49908) 71.45%.

Files Patch % Lines
...g/opensearch/common/util/ReorganizingLongHash.java 60.00% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main   #11953      +/-   ##
============================================
+ Coverage     71.41%   71.45%   +0.03%     
+ Complexity    59397    59388       -9     
============================================
  Files          4923     4923              
  Lines        279212   279214       +2     
  Branches      40595    40596       +1     
============================================
+ Hits         199408   199515     +107     
+ Misses        63223    63111     -112     
- Partials      16581    16588       +7     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@reta reta merged commit 84f303b into opensearch-project:main Jan 19, 2024
31 of 32 checks passed
@reta reta added the backport 2.x Backport to 2.x branch label Jan 19, 2024
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/OpenSearch/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/OpenSearch/backport-2.x
# Create a new branch
git switch --create backport/backport-11953-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 84f303b6069893cf9350556a4c316153ba526901
# Push it to GitHub
git push --set-upstream origin backport/backport-11953-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/OpenSearch/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-11953-to-2.x.

@reta
Copy link
Collaborator

reta commented Jan 19, 2024

@neetikasinghal could you please backport to 2.x manually? thank you

neetikasinghal added a commit to neetikasinghal/OpenSearch that referenced this pull request Jan 19, 2024
Signed-off-by: Neetika Singhal <neetiks@amazon.com>
(cherry picked from commit 84f303b)
@neetikasinghal neetikasinghal deleted the flaky-cardi-final-2 branch January 19, 2024 22:41
reta pushed a commit that referenced this pull request Jan 20, 2024
Signed-off-by: Neetika Singhal <neetiks@amazon.com>
(cherry picked from commit 84f303b)
peteralfonsi pushed a commit to peteralfonsi/OpenSearch that referenced this pull request Mar 1, 2024
Signed-off-by: Neetika Singhal <neetiks@amazon.com>
rayshrey pushed a commit to rayshrey/OpenSearch that referenced this pull request Mar 18, 2024
Signed-off-by: Neetika Singhal <neetiks@amazon.com>
shiv0408 pushed a commit to Gaurav614/OpenSearch that referenced this pull request Apr 25, 2024
Signed-off-by: Neetika Singhal <neetiks@amazon.com>
Signed-off-by: Shivansh Arora <hishiv@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Backport to 2.x branch backport-failed bug Something isn't working flaky-test Random test failure that succeeds on second run
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Flaky org.opensearch.search.aggregations.metrics.CardinalityWithRequestBreakerIT test
4 participants