Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Fix memory bloat caused by holding too many unclosed ArrowReaderIterators #929

Merged
merged 1 commit into from
Sep 10, 2024

Conversation

Kontinuation
Copy link
Member

Which issue does this PR close?

Closes #927.

Rationale for this change

Please refer to the comments of #927 for details.

What changes are included in this PR?

  • Holds at most one unclosed ArrowReaderIterator in CometBlockStoreShuffleReader.
  • (optional) automatically close the ArrowReaderIterator when reading the end of the stream.

How are these changes tested?

It is pretty hard to add tests for this fix, so we manually tested this and relying on existing tests to make sure that it does not break anything.

}

batch = nextBatch()
if (batch.isEmpty) {
close()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because I am paranoid and want to close the iterator early. I can revert changes to this file if you don't like it :)

Comment on lines +91 to +101
var currentReadIterator: ArrowReaderIterator = null

// Closes last read iterator after the task is finished.
// We need to close read iterator during iterating input streams,
// instead of one callback per read iterator. Otherwise if there are too many
// read iterators, it may blow up the call stack and cause OOM.
context.addTaskCompletionListener[Unit] { _ =>
if (currentReadIterator != null) {
currentReadIterator.close()
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the issue description, this is the major fix, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@Kontinuation Kontinuation marked this pull request as ready for review September 9, 2024 15:27
@viirya
Copy link
Member

viirya commented Sep 9, 2024

This looks good to me.

@Kontinuation Could you rebase this? I would like to make sure this change can pass CI after latest CI change in main branch. Thank you.

@Kontinuation
Copy link
Member Author

This looks good to me.

@Kontinuation Could you rebase this? I would like to make sure this change can pass CI after latest CI change in main branch. Thank you.

Rebased. One of the tests failed because of failing to download dependencies from maven central, it should be transient.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 15 lines in your changes missing coverage. Please review.

Project coverage is 54.87%. Comparing base (033fe6f) to head (86aee79).
Report is 15 commits behind head on main.

Files with missing lines Patch % Lines
.../comet/execution/shuffle/ArrowReaderIterator.scala 0.00% 11 Missing ⚠️
...ecution/shuffle/CometBlockStoreShuffleReader.scala 0.00% 4 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff              @@
##               main     #929       +/-   ##
=============================================
+ Coverage     34.03%   54.87%   +20.83%     
+ Complexity      883      853       -30     
=============================================
  Files           113      109        -4     
  Lines         43170    10707    -32463     
  Branches       9516     2053     -7463     
=============================================
- Hits          14693     5875     -8818     
+ Misses        25471     3798    -21673     
+ Partials       3006     1034     -1972     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@viirya
Copy link
Member

viirya commented Sep 10, 2024

Rebased. One of the tests failed because of failing to download dependencies from maven central, it should be transient.

Thanks. I re-triggered the test. Usually it will pass after that.

@viirya viirya merged commit c905f40 into apache:main Sep 10, 2024
74 checks passed
@viirya
Copy link
Member

viirya commented Sep 10, 2024

Thanks @Kontinuation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Excessive memory usage when running TPC-H Query 21 on a large cluster
3 participants