-
Notifications
You must be signed in to change notification settings - Fork 906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Heap memory leak problem when ledger replication failed #2794
Heap memory leak problem when ledger replication failed #2794
Conversation
@eolivelli PTAL |
@nicoloboschi @RaulGracia @Ghatage PTAL if you have time please @gaozhangmin which memory is leaked ? direct memory ? |
heap memory |
|
I've tested this pr in my test cluster, it can solve the heap memory leak problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 nice catch
I'd like to point out that the dead objects are kept alive by AbstractZkLedgerManager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with one nit
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/LedgerOpenOp.java
Outdated
Show resolved
Hide resolved
0ea9789
to
f745744
Compare
@dlg99 PTAL |
@gaozhangmin looks good! Thank you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, would it make sense to add some specific test to validate the scenario in which memory was getting leaked (i.e., ledger handles are being closed in the situation in which they were left open before)?
@RaulGracia I tried, But it's hard to verify if the |
BP-44: USE metrics. A proposal for improving BookKeeper metrics so that operators can employ the USE method for diagnosing performance issues. Reviewers: Henry Saputra <hsaputra@apache.org>, Andrey Yegorov <None>, Enrico Olivelli <eolivelli@gmail.com> This closes #2835 from Vanlightly/BP-44-use-metrics and squashes the following commits: 8d9baab [Jack Vanlightly] Added link to USE method and listed each term of USE 5a0f67d [Jack Vanlightly] BP-44 USE metrics a9b576d [Yunze Xu] Release semaphore when addEntry accepts the same entries (#2832) 148bf22 [Yun Tang] Ensure to release cache during KeyValueStorageRocksDB#closec (#2821) 4dc4260 [gaozhangmin] Heap memory leak problem when ledger replication failed (#2794) a522fa3 [Raúl Gracia] Issue 2815: Upgrade to log4j2 to get rid of CVE-2019-17571 (#2816) 0465052 [Nicolò Boschi] Upgrade httpclient from 4.5.5 to 4.5.13 (#2793) 594a056 [Raúl Gracia] Issue 2795: Bookkeeper upgrade using Bookie ID may fail due to cookie mismatch (#2796) 354cf37 [Raúl Gracia] Upgraded dependencies with CVEs (#2792) e413c70 [Raúl Gracia] Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option (#2779) 883231e [pradeepbn] Building bookkeeper with gradle on java11
### Motivation production environment, memory leak always happened, and there were ledger cannot be replicated successfully. This cause by when `openLedgerNoRecovery` with `BKNotEnoughBookiesException`, the LedgerHandler won't closed properly, caused memory leak https://github.com/apache/bookkeeper/blob/c7236adc3cb659e65ae5ce53b7156569d7f50ebd/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java#L364-L424 ### Changes close LedgerHandler when openComplete with exception (cherry picked from commit 4dc4260)
### Motivation In order to complete migration to Gradle we must build all the subprojects. ### Changes - Enabled `sh` integration tests with gradle, located in `tests/scripts/src/test/bash/gradle` - Added these modules to the build - `bookkeeper-http:servlet-http-server` - `metadata-drivers:etcd` - `tests:backward-compat:*` - `tests:shaded:*` - `stream:bk-grpc-name-resolver` - DL shading process is now performed (before it didn't build any jar) - Groovy tests (`tests:backward-compat:*`) now are triggered by the build/tests itself; with Maven, there is a "runner" project (`tests/integration-tests-base-groovy`); in Gradle is useless so it is skipped ### Test - Both `bin/bookkeper standalone` and `bin/bookkeper_gradle standalone` work locally - Tests are passing locally Master Issue: #2849 Reviewers: Henry Saputra <hsaputra@apache.org>, Prashant Kumar <None> This closes #2850 from nicoloboschi/fix/2849/gradle and squashes the following commits: 00b49f4 [Nicolò Boschi] Fix common_gradle.sh regex bd739fd [Nicolò Boschi] fix sh tests 43230ba [Nicolò Boschi] revert sh files. Avoid to modify maven files, create gradle versions to faciltate migration d1f95e4 [Nicolò Boschi] fix shaded deps bcab40d [Nicolò Boschi] fix build 5fd0341 [Nicolò Boschi] fix build 0082e0e [Nicolò Boschi] fix build 2c32ac1 [Nicolò Boschi] fixes 3bc0b26 [Nicolò Boschi] bookkeeper-server-shaded-tests ba89132 [Nicolò Boschi] shaded tests 6d39e33 [Nicolò Boschi] sh tests e0032bc [Nicolò Boschi] actually run arquillian groovy tests 08dcc39 [Nicolò Boschi] backwards 2361f79 [Nicolò Boschi] hierarchical-ledger-manager 8388e11 [Nicolò Boschi] current-server-old-clients 6a24344 [Nicolò Boschi] bc-non-fips 2faca01 [Nicolò Boschi] bk-grpc-name-resolver 991bc11 [Nicolò Boschi] servlet-http-server 675ef7b [Nicolò Boschi] etcd b1d5e14 [ZhangJian He] A empty implement in EtcdLedgerManagerFactory to let the project can compile (#2845) bd5c50b [shustsud] Add error handling to readLedgerMetadata in over-replicated ledger GC (#2844) 746f9f6 [Matteo Merli] Remove direct ZK access for Auditor (#2842) 4117200 [ZhangJian He] the compare should be >= instead of > (#2782) 14ef56f [Prashant Kumar] BookieId can not be cast to BookieSocketAddress (#2843) e10f3fe [ZhangJian He] Forget to close preAllocator log on shutdown (#2819) 53954ca [shustsud] Add ensemble check to over-replicated ledger GC (#2813) 919fdd3 [Prashant Kumar] Issue:2840 Create bookie shellscript for gradle (#2841) 031d168 [gaozhangmin] fix-npe-when-pulsar-ZkBookieRackAffinityMapping-getBookieAddressResolver (#2788) 3dd671c [Prashant Kumar] Migrate bookkeepr-server:test to gradle run unit tests excepts org.apache.bookkeeper.bookie. org.apache.bookkeeper.client org.apache.bookkeeper.replication org.apache.bookkeeper.tls. (#2812) f6903b8 [Jack Vanlightly] BP-44 USE metrics a4afaa4 [Matteo Merli] Eliminate direct ZK access in ScanAndCompareGarbageCollector (#2833) a9b576d [Yunze Xu] Release semaphore when addEntry accepts the same entries (#2832) 148bf22 [Yun Tang] Ensure to release cache during KeyValueStorageRocksDB#closec (#2821) 4dc4260 [gaozhangmin] Heap memory leak problem when ledger replication failed (#2794) a522fa3 [Raúl Gracia] Issue 2815: Upgrade to log4j2 to get rid of CVE-2019-17571 (#2816) 0465052 [Nicolò Boschi] Upgrade httpclient from 4.5.5 to 4.5.13 (#2793) 594a056 [Raúl Gracia] Issue 2795: Bookkeeper upgrade using Bookie ID may fail due to cookie mismatch (#2796) 354cf37 [Raúl Gracia] Upgraded dependencies with CVEs (#2792) e413c70 [Raúl Gracia] Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option (#2779) 883231e [pradeepbn] Building bookkeeper with gradle on java11
### Motivation production environment, memory leak always happened, and there were ledger cannot be replicated successfully. This cause by when `openLedgerNoRecovery` with `BKNotEnoughBookiesException`, the LedgerHandler won't closed properly, caused memory leak https://github.com/apache/bookkeeper/blob/c7236adc3cb659e65ae5ce53b7156569d7f50ebd/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java#L364-L424 ### Changes close LedgerHandler when openComplete with exception
BP-44: USE metrics. A proposal for improving BookKeeper metrics so that operators can employ the USE method for diagnosing performance issues. Reviewers: Henry Saputra <hsaputra@apache.org>, Andrey Yegorov <None>, Enrico Olivelli <eolivelli@gmail.com> This closes apache#2835 from Vanlightly/BP-44-use-metrics and squashes the following commits: 8d9baab [Jack Vanlightly] Added link to USE method and listed each term of USE 5a0f67d [Jack Vanlightly] BP-44 USE metrics a9b576d [Yunze Xu] Release semaphore when addEntry accepts the same entries (apache#2832) 148bf22 [Yun Tang] Ensure to release cache during KeyValueStorageRocksDB#closec (apache#2821) 4dc4260 [gaozhangmin] Heap memory leak problem when ledger replication failed (apache#2794) a522fa3 [Raúl Gracia] Issue 2815: Upgrade to log4j2 to get rid of CVE-2019-17571 (apache#2816) 0465052 [Nicolò Boschi] Upgrade httpclient from 4.5.5 to 4.5.13 (apache#2793) 594a056 [Raúl Gracia] Issue 2795: Bookkeeper upgrade using Bookie ID may fail due to cookie mismatch (apache#2796) 354cf37 [Raúl Gracia] Upgraded dependencies with CVEs (apache#2792) e413c70 [Raúl Gracia] Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option (apache#2779) 883231e [pradeepbn] Building bookkeeper with gradle on java11
### Motivation In order to complete migration to Gradle we must build all the subprojects. ### Changes - Enabled `sh` integration tests with gradle, located in `tests/scripts/src/test/bash/gradle` - Added these modules to the build - `bookkeeper-http:servlet-http-server` - `metadata-drivers:etcd` - `tests:backward-compat:*` - `tests:shaded:*` - `stream:bk-grpc-name-resolver` - DL shading process is now performed (before it didn't build any jar) - Groovy tests (`tests:backward-compat:*`) now are triggered by the build/tests itself; with Maven, there is a "runner" project (`tests/integration-tests-base-groovy`); in Gradle is useless so it is skipped ### Test - Both `bin/bookkeper standalone` and `bin/bookkeper_gradle standalone` work locally - Tests are passing locally Master Issue: apache#2849 Reviewers: Henry Saputra <hsaputra@apache.org>, Prashant Kumar <None> This closes apache#2850 from nicoloboschi/fix/2849/gradle and squashes the following commits: 00b49f4 [Nicolò Boschi] Fix common_gradle.sh regex bd739fd [Nicolò Boschi] fix sh tests 43230ba [Nicolò Boschi] revert sh files. Avoid to modify maven files, create gradle versions to faciltate migration d1f95e4 [Nicolò Boschi] fix shaded deps bcab40d [Nicolò Boschi] fix build 5fd0341 [Nicolò Boschi] fix build 0082e0e [Nicolò Boschi] fix build 2c32ac1 [Nicolò Boschi] fixes 3bc0b26 [Nicolò Boschi] bookkeeper-server-shaded-tests ba89132 [Nicolò Boschi] shaded tests 6d39e33 [Nicolò Boschi] sh tests e0032bc [Nicolò Boschi] actually run arquillian groovy tests 08dcc39 [Nicolò Boschi] backwards 2361f79 [Nicolò Boschi] hierarchical-ledger-manager 8388e11 [Nicolò Boschi] current-server-old-clients 6a24344 [Nicolò Boschi] bc-non-fips 2faca01 [Nicolò Boschi] bk-grpc-name-resolver 991bc11 [Nicolò Boschi] servlet-http-server 675ef7b [Nicolò Boschi] etcd b1d5e14 [ZhangJian He] A empty implement in EtcdLedgerManagerFactory to let the project can compile (apache#2845) bd5c50b [shustsud] Add error handling to readLedgerMetadata in over-replicated ledger GC (apache#2844) 746f9f6 [Matteo Merli] Remove direct ZK access for Auditor (apache#2842) 4117200 [ZhangJian He] the compare should be >= instead of > (apache#2782) 14ef56f [Prashant Kumar] BookieId can not be cast to BookieSocketAddress (apache#2843) e10f3fe [ZhangJian He] Forget to close preAllocator log on shutdown (apache#2819) 53954ca [shustsud] Add ensemble check to over-replicated ledger GC (apache#2813) 919fdd3 [Prashant Kumar] Issue:2840 Create bookie shellscript for gradle (apache#2841) 031d168 [gaozhangmin] fix-npe-when-pulsar-ZkBookieRackAffinityMapping-getBookieAddressResolver (apache#2788) 3dd671c [Prashant Kumar] Migrate bookkeepr-server:test to gradle run unit tests excepts org.apache.bookkeeper.bookie. org.apache.bookkeeper.client org.apache.bookkeeper.replication org.apache.bookkeeper.tls. (apache#2812) f6903b8 [Jack Vanlightly] BP-44 USE metrics a4afaa4 [Matteo Merli] Eliminate direct ZK access in ScanAndCompareGarbageCollector (apache#2833) a9b576d [Yunze Xu] Release semaphore when addEntry accepts the same entries (apache#2832) 148bf22 [Yun Tang] Ensure to release cache during KeyValueStorageRocksDB#closec (apache#2821) 4dc4260 [gaozhangmin] Heap memory leak problem when ledger replication failed (apache#2794) a522fa3 [Raúl Gracia] Issue 2815: Upgrade to log4j2 to get rid of CVE-2019-17571 (apache#2816) 0465052 [Nicolò Boschi] Upgrade httpclient from 4.5.5 to 4.5.13 (apache#2793) 594a056 [Raúl Gracia] Issue 2795: Bookkeeper upgrade using Bookie ID may fail due to cookie mismatch (apache#2796) 354cf37 [Raúl Gracia] Upgraded dependencies with CVEs (apache#2792) e413c70 [Raúl Gracia] Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option (apache#2779) 883231e [pradeepbn] Building bookkeeper with gradle on java11
Motivation
production environment, memory leak always happened, and there were ledger cannot be replicated successfully.
This cause by when
openLedgerNoRecovery
withBKNotEnoughBookiesException
, the LedgerHandler won't closed properly, caused memory leakbookkeeper/bookkeeper-server/src/main/java/org/apache/bookkeeper/replication/ReplicationWorker.java
Lines 364 to 424 in c7236ad
Changes
close LedgerHandler when openComplete with exception