Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that underlying resources of the parquet reader are closed #11418

Merged

Conversation

findinpath
Copy link
Contributor

Description

Release the underlying resources after reading the number of records from the Parquet file.

Is this change a fix, improvement, new feature, refactoring, or other?

Fix.

Is this a change to the core query engine, a connector, client library, or the SPI interfaces? (be specific)

Delta Lake connector.

How would you describe this change to a non-technical end user or system administrator?

Before this change, the underlying AWS client library which was accumulating “leased” connections after each UPDATE statement performed via delta-lake connector. This sounded like a connection leak.
The current change makes sure that the underlying resources after reading the number of records from the Parquet file
via AWS S3 API are being closed. In this manner, the leased connection is being released.

Related issues, pull requests, and links

Documentation

(x) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.

Release notes

(x) No release notes entries required.
( ) Release notes entries required with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

Copy link
Member

@jirassimok jirassimok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. The only thing I'd change is that if we're changing a line that's this long, I'd prefer we also find a way to wrap it a bit (maybe just split the line after open().

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the only one?

perhaps we could have a s3 max open connections setting in aws smoke test to some low value -- reproducing the problem, and ensuring the fix, and guarding against regressions

@findinpath
Copy link
Contributor Author

Is this the only one?

I looked through the source code for the usages of filesystem create ... / open ... operations which deal with streams and found out that they were using all try-with-resources pattern.

perhaps we could have a s3 max open connections setting in aws smoke test to some low value -- reproducing the problem, and ensuring the fix, and guarding against regressions

@findepi I will modify the aws smoke test to take your suggestion into consideration. Thanks for the suggestion.

Use a low value for the property `hive.s3.max-connections`
in the AWS smoke tests in order to ensure that there
are no AWS http connection leaks and guard against
eventual regressions.
@findinpath findinpath force-pushed the delta-lake-parquet-reader-connection-leak branch from 50cb61a to c49c213 Compare March 11, 2022 10:19
@findepi findepi merged commit 637d235 into trinodb:master Mar 11, 2022
@findepi findepi mentioned this pull request Mar 11, 2022
@github-actions github-actions bot added this to the 374 milestone Mar 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

4 participants