Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix-npe-when-pulsar-ZkBookieRackAffinityMapping-getBookieAddressResolver #2788

Conversation

gaozhangmin
Copy link
Contributor

@gaozhangmin gaozhangmin commented Sep 8, 2021

Error log:

16:21:20.140 [main] ERROR org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to initialize DNS Resolver org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping, used default subnet resolver : java.lang.RuntimeException: java.lang.NullPointerException java.lang.NullPointerException

BookieAddressResolver should be set before ((Configurable) dnsResolver).setConf(conf);

It will throw npe. when pulsar ZkBookieRackAffinityMapping invoke getBookieAddressResolver

@gaozhangmin
Copy link
Contributor Author

@hangc0276 PTAL

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor

@nicoloboschi nicoloboschi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gaozhangmin
Copy link
Contributor Author

pulsar also has a related issue apache/pulsar#11947
@eolivelli @lhotari @nicoloboschi PTAL

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari
Copy link
Member

lhotari commented Sep 8, 2021

@gaozhangmin Can you add the stacktrace containing the NPE to the description? That could help find this PR if someone searches by stacktrace elements.

@gaozhangmin
Copy link
Contributor Author

@gaozhangmin Can you add the stacktrace containing the NPE to the description? That could help find this PR if someone searches by stacktrace elements.

there no stacktrace in log , only java.lang.nullpointexception

@lhotari
Copy link
Member

lhotari commented Sep 8, 2021

there no stacktrace in log , only java.lang.nullpointexception

The log line with that could also help locate this issue when someone with the same problem searches for it. Please share.

@gaozhangmin
Copy link
Contributor Author

there no stacktrace in log , only java.lang.nullpointexception

The log line with that could also help locate this issue when someone with the same problem searches for it. Please share.

added

@lhotari
Copy link
Member

lhotari commented Sep 8, 2021

Thanks for providing the log message @gaozhangmin . It looks like the logging is broken since the provided sample log message didn't include the stacktrace. I'd propose to fix it in the same PR since it's closely related to the change that is being made.

This looks invalid:
LOG.error("Failed to initialize DNS Resolver {}, used default subnet resolver : {} {}", dnsResolverName, re, re.getMessage());

I wonder why it's not simply
LOG.error("Failed to initialize DNS Resolver {}, using default subnet resolver.", dnsResolverName, re); ?

@gaozhangmin
Copy link
Contributor Author

Thanks for providing the log message @gaozhangmin . It looks like the logging is broken since the provided sample log message didn't include the stacktrace. I'd propose to fix it in the same PR since it's closely related to the change that is being made.

This looks invalid:
LOG.error("Failed to initialize DNS Resolver {}, used default subnet resolver : {} {}", dnsResolverName, re, re.getMessage());

I wonder why it's not simply
LOG.error("Failed to initialize DNS Resolver {}, using default subnet resolver.", dnsResolverName, re); ?

changed

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask you to:

  • configure dnsResolver where it is created/before passing it to downstream components instead of setting config to the instance passed here (this can have side-effects, I don't remember if we have to worry about thread safety in that setter etc.) See my comment on the code.
    Overall I'd assume that component should get a usable parameter value instead of a half-baked one, so patching the configuration there smells like a hack instead of a fix.
    In case that's not possible for some reason I'd love to understand why (to be fair, i didn't look at the code that deep to remember all nuances).

  • add unit-tests

  • check other policies (region-aware/zone-aware) to make sure these problems don't exist there.

i.e. in ZoneawareEnsemblePlacementPolicyImpl I see

        DNSToSwitchMapping actualDNSResolver;
        if (optionalDnsResolver.isPresent()) {
            actualDNSResolver = optionalDnsResolver.get();
        } else {

I assume that we'll be fine if we pass properly configured resolver there.
But if there is an issue that prevents us from doing so, both policies will have similar problems.

@eolivelli
Copy link
Contributor

Canbyou add a test cases that shows the problem?
This way it will be clear what we are fixing and that you fixed it.
I would prefer to refactor interfaces or pluggable components in order to not introduce API changes, in that case it would be harder to upgrade BK

@gaozhangmin
Copy link
Contributor Author

I would prefer to refactor interfaces or pluggable components in order to not introduce API changes, in that case it would be harder to upgrade BK

This need pulsar org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping class, I do not know how to test this.

@eolivelli
Copy link
Contributor

eolivelli commented Sep 9, 2021

Sorry I meant:

I would NOT prefer to refactor interfaces or pluggable components in order to not introduce API changes, in that case it would be harder to upgrade BK

So this change in this current form works for me.
That said, adding a simple test case that reproduces the NPE will be better, otherwise we could break ZkBookieRackAffinityMapping again

If you have the error on ZkBookieRackAffinityMapping you could try to use the bookieAddressResolver in the way that ZkBookieRackAffinityMapping does, but in a dummy implementation

@gaozhangmin
Copy link
Contributor Author

Sorry I meant:

I would NOT prefer to refactor interfaces or pluggable components in order to not introduce API changes, in that case it would be harder to upgrade BK

So this change in this current form works for me.
That said, adding a simple test case that reproduces the NPE will be better, otherwise we could break ZkBookieRackAffinityMapping again

If you have the error on ZkBookieRackAffinityMapping you could try to use the bookieAddressResolver in the way that ZkBookieRackAffinityMapping does, but in a dummy implementation

Unit test added, PTAL

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we are on our way.

Is it possible to add some test cases, or enhance some existing test cases, in order to cover the changes you made ?

@gaozhangmin
Copy link
Contributor Author

I believe we are on our way.

Is it possible to add some test cases, or enhance some existing test cases, in order to cover the changes you made ?

I think the unit test TestRackawareEnsemblePlacementPolicy already covered this change.

like repp.initialize(conf, Optional.<DNSToSwitchMapping>empty(), timer, DISABLE_ALL, NullStatsLogger.INSTANCE, BookieSocketAddress.LEGACY_BOOKIEID_RESOLVER);

the dnsResolver is null, will reach the condition !optionalDnsResolver.isPresent()

Copy link
Contributor

@dlg99 dlg99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome!
Thank you for going extra mile with the tests.

@gaozhangmin
Copy link
Contributor Author

any problem with this pr?

Copy link
Contributor

@eolivelli eolivelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm

@eolivelli
Copy link
Contributor

We will merge as soon as CI passes.

@gaozhangmin
Copy link
Contributor Author

The failed tests seems coming from timeouts. it's not related with this change.

@gaozhangmin gaozhangmin force-pushed the fix-npe-when-pulsar-ZkBookieRackAffinityMapping-getBookieAddressResolver branch from 9f2bddb to 497c9db Compare September 24, 2021 02:22
@gaozhangmin
Copy link
Contributor Author

@eolivelli Please approve running workflows

@gaozhangmin
Copy link
Contributor Author

@dlg99 Please help approve runn workflows

@gaozhangmin
Copy link
Contributor Author

@eolivelli the failed tests come from timeouts, please rerun failed tests

@gaozhangmin
Copy link
Contributor Author

@eolivelli CI passes. please merge this pr

@gaozhangmin
Copy link
Contributor Author

@eolivelli

@zymap zymap merged commit 031d168 into apache:master Oct 21, 2021
zymap pushed a commit that referenced this pull request Oct 26, 2021
…ver (#2788)

Error log:

`16:21:20.140 [main] ERROR org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to initialize DNS Resolver org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping, used default subnet resolver : java.lang.RuntimeException: java.lang.NullPointerException java.lang.NullPointerException`

`BookieAddressResolver` should be set before  `((Configurable) dnsResolver).setConf(conf);`

It will throw npe. when pulsar `ZkBookieRackAffinityMapping` invoke getBookieAddressResolver

(cherry picked from commit 031d168)
hsaputra pushed a commit that referenced this pull request Oct 29, 2021
### Motivation

In order to complete migration to Gradle we must build all the subprojects.

### Changes

- Enabled `sh` integration tests with gradle, located in `tests/scripts/src/test/bash/gradle`
- Added these modules to the build
    - `bookkeeper-http:servlet-http-server` 
    - `metadata-drivers:etcd`
    - `tests:backward-compat:*`
    - `tests:shaded:*`
    - `stream:bk-grpc-name-resolver`
- DL shading process is now performed (before it didn't build any jar)
- Groovy tests (`tests:backward-compat:*`) now are triggered by the build/tests itself; with Maven, there is a "runner" project (`tests/integration-tests-base-groovy`); in Gradle is useless so it is skipped


### Test

- Both `bin/bookkeper standalone` and `bin/bookkeper_gradle standalone` work locally
- Tests are passing locally  

Master Issue: #2849



Reviewers: Henry Saputra <hsaputra@apache.org>, Prashant Kumar <None>

This closes #2850 from nicoloboschi/fix/2849/gradle and squashes the following commits:

00b49f4 [Nicolò Boschi] Fix common_gradle.sh regex
bd739fd [Nicolò Boschi] fix sh tests
43230ba [Nicolò Boschi] revert sh files. Avoid to modify maven files, create gradle versions to faciltate migration
d1f95e4 [Nicolò Boschi] fix shaded deps
bcab40d [Nicolò Boschi] fix build
5fd0341 [Nicolò Boschi] fix build
0082e0e [Nicolò Boschi] fix build
2c32ac1 [Nicolò Boschi] fixes
3bc0b26 [Nicolò Boschi] bookkeeper-server-shaded-tests
ba89132 [Nicolò Boschi] shaded tests
6d39e33 [Nicolò Boschi] sh tests
e0032bc [Nicolò Boschi] actually run arquillian groovy tests
08dcc39 [Nicolò Boschi] backwards
2361f79 [Nicolò Boschi] hierarchical-ledger-manager
8388e11 [Nicolò Boschi] current-server-old-clients
6a24344 [Nicolò Boschi] bc-non-fips
2faca01 [Nicolò Boschi] bk-grpc-name-resolver
991bc11 [Nicolò Boschi] servlet-http-server
675ef7b [Nicolò Boschi] etcd
b1d5e14 [ZhangJian He] A empty implement in EtcdLedgerManagerFactory to let the project can compile (#2845)
bd5c50b [shustsud] Add error handling to readLedgerMetadata in over-replicated ledger GC (#2844)
746f9f6 [Matteo Merli] Remove direct ZK access for Auditor (#2842)
4117200 [ZhangJian He] the compare should be >= instead of > (#2782)
14ef56f [Prashant Kumar] BookieId can not be cast to BookieSocketAddress (#2843)
e10f3fe [ZhangJian He] Forget to close preAllocator log on shutdown (#2819)
53954ca [shustsud] Add ensemble check to over-replicated ledger GC (#2813)
919fdd3 [Prashant Kumar] Issue:2840 Create bookie shellscript for gradle (#2841)
031d168 [gaozhangmin] fix-npe-when-pulsar-ZkBookieRackAffinityMapping-getBookieAddressResolver (#2788)
3dd671c [Prashant Kumar] Migrate bookkeepr-server:test to gradle run unit tests excepts org.apache.bookkeeper.bookie. org.apache.bookkeeper.client org.apache.bookkeeper.replication org.apache.bookkeeper.tls. (#2812)
f6903b8 [Jack Vanlightly] BP-44 USE metrics
a4afaa4 [Matteo Merli] Eliminate direct ZK access in ScanAndCompareGarbageCollector (#2833)
a9b576d [Yunze Xu] Release semaphore when addEntry accepts the same entries (#2832)
148bf22 [Yun Tang] Ensure to release cache during KeyValueStorageRocksDB#closec (#2821)
4dc4260 [gaozhangmin] Heap memory leak problem when ledger replication failed (#2794)
a522fa3 [Raúl Gracia] Issue 2815: Upgrade to log4j2 to get rid of CVE-2019-17571 (#2816)
0465052 [Nicolò Boschi] Upgrade httpclient from 4.5.5 to 4.5.13 (#2793)
594a056 [Raúl Gracia] Issue 2795: Bookkeeper upgrade using Bookie ID may fail due to cookie mismatch (#2796)
354cf37 [Raúl Gracia] Upgraded dependencies with CVEs (#2792)
e413c70 [Raúl Gracia] Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option (#2779)
883231e [pradeepbn] Building bookkeeper with gradle on java11
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
…ver (apache#2788)

Error log:

`16:21:20.140 [main] ERROR org.apache.bookkeeper.client.RackawareEnsemblePlacementPolicyImpl - Failed to initialize DNS Resolver org.apache.pulsar.zookeeper.ZkBookieRackAffinityMapping, used default subnet resolver : java.lang.RuntimeException: java.lang.NullPointerException java.lang.NullPointerException`

`BookieAddressResolver` should be set before  `((Configurable) dnsResolver).setConf(conf);`  

It will throw npe. when pulsar `ZkBookieRackAffinityMapping` invoke getBookieAddressResolver
Ghatage pushed a commit to sijie/bookkeeper that referenced this pull request Jul 12, 2024
### Motivation

In order to complete migration to Gradle we must build all the subprojects.

### Changes

- Enabled `sh` integration tests with gradle, located in `tests/scripts/src/test/bash/gradle`
- Added these modules to the build
    - `bookkeeper-http:servlet-http-server` 
    - `metadata-drivers:etcd`
    - `tests:backward-compat:*`
    - `tests:shaded:*`
    - `stream:bk-grpc-name-resolver`
- DL shading process is now performed (before it didn't build any jar)
- Groovy tests (`tests:backward-compat:*`) now are triggered by the build/tests itself; with Maven, there is a "runner" project (`tests/integration-tests-base-groovy`); in Gradle is useless so it is skipped


### Test

- Both `bin/bookkeper standalone` and `bin/bookkeper_gradle standalone` work locally
- Tests are passing locally  

Master Issue: apache#2849



Reviewers: Henry Saputra <hsaputra@apache.org>, Prashant Kumar <None>

This closes apache#2850 from nicoloboschi/fix/2849/gradle and squashes the following commits:

00b49f4 [Nicolò Boschi] Fix common_gradle.sh regex
bd739fd [Nicolò Boschi] fix sh tests
43230ba [Nicolò Boschi] revert sh files. Avoid to modify maven files, create gradle versions to faciltate migration
d1f95e4 [Nicolò Boschi] fix shaded deps
bcab40d [Nicolò Boschi] fix build
5fd0341 [Nicolò Boschi] fix build
0082e0e [Nicolò Boschi] fix build
2c32ac1 [Nicolò Boschi] fixes
3bc0b26 [Nicolò Boschi] bookkeeper-server-shaded-tests
ba89132 [Nicolò Boschi] shaded tests
6d39e33 [Nicolò Boschi] sh tests
e0032bc [Nicolò Boschi] actually run arquillian groovy tests
08dcc39 [Nicolò Boschi] backwards
2361f79 [Nicolò Boschi] hierarchical-ledger-manager
8388e11 [Nicolò Boschi] current-server-old-clients
6a24344 [Nicolò Boschi] bc-non-fips
2faca01 [Nicolò Boschi] bk-grpc-name-resolver
991bc11 [Nicolò Boschi] servlet-http-server
675ef7b [Nicolò Boschi] etcd
b1d5e14 [ZhangJian He] A empty implement in EtcdLedgerManagerFactory to let the project can compile (apache#2845)
bd5c50b [shustsud] Add error handling to readLedgerMetadata in over-replicated ledger GC (apache#2844)
746f9f6 [Matteo Merli] Remove direct ZK access for Auditor (apache#2842)
4117200 [ZhangJian He] the compare should be >= instead of > (apache#2782)
14ef56f [Prashant Kumar] BookieId can not be cast to BookieSocketAddress (apache#2843)
e10f3fe [ZhangJian He] Forget to close preAllocator log on shutdown (apache#2819)
53954ca [shustsud] Add ensemble check to over-replicated ledger GC (apache#2813)
919fdd3 [Prashant Kumar] Issue:2840 Create bookie shellscript for gradle (apache#2841)
031d168 [gaozhangmin] fix-npe-when-pulsar-ZkBookieRackAffinityMapping-getBookieAddressResolver (apache#2788)
3dd671c [Prashant Kumar] Migrate bookkeepr-server:test to gradle run unit tests excepts org.apache.bookkeeper.bookie. org.apache.bookkeeper.client org.apache.bookkeeper.replication org.apache.bookkeeper.tls. (apache#2812)
f6903b8 [Jack Vanlightly] BP-44 USE metrics
a4afaa4 [Matteo Merli] Eliminate direct ZK access in ScanAndCompareGarbageCollector (apache#2833)
a9b576d [Yunze Xu] Release semaphore when addEntry accepts the same entries (apache#2832)
148bf22 [Yun Tang] Ensure to release cache during KeyValueStorageRocksDB#closec (apache#2821)
4dc4260 [gaozhangmin] Heap memory leak problem when ledger replication failed (apache#2794)
a522fa3 [Raúl Gracia] Issue 2815: Upgrade to log4j2 to get rid of CVE-2019-17571 (apache#2816)
0465052 [Nicolò Boschi] Upgrade httpclient from 4.5.5 to 4.5.13 (apache#2793)
594a056 [Raúl Gracia] Issue 2795: Bookkeeper upgrade using Bookie ID may fail due to cookie mismatch (apache#2796)
354cf37 [Raúl Gracia] Upgraded dependencies with CVEs (apache#2792)
e413c70 [Raúl Gracia] Issue 2728: Entry Log GC may get blocked when using entryLogPerLedgerEnabled option (apache#2779)
883231e [pradeepbn] Building bookkeeper with gradle on java11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants