[improve][broker] Register the broker to metadata store without version id compare #23298

BewareMyPower · 2024-09-12T15:43:36Z

Motivation

It's observed in our test environment many times after restarting the only broker with the extensible load manager's broker registry service.

2024-09-12T14:00:20,557+0000 [main] ERROR org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl - Failed to start the extensible load balance and close broker registry org.apache.pulsar.broker.loadbalance.extensions.BrokerRegistryImpl@6d6c4775.
org.apache.pulsar.broker.PulsarServerException: org.apache.pulsar.metadata.api.MetadataStoreException$LockBusyException: Resource at /loadbalance/brokers/<broker-id>:8080 is already locked
        at org.apache.pulsar.broker.loadbalance.extensions.BrokerRegistryImpl.start(BrokerRegistryImpl.java:112) ~
        at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl.start(ExtensibleLoadManagerImpl.java:370) 
        at org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerWrapper.start(ExtensibleLoadManagerWrapper.java:50) 
        at org.apache.pulsar.broker.PulsarService.startLoadManagementService(PulsarService.java:1426) 
        at org.apache.pulsar.broker.PulsarService.start(PulsarService.java:986) 
        ...
Caused by: org.apache.pulsar.metadata.api.MetadataStoreException$LockBusyException: Resource at /loadbalance/brokers/<broker-id>:8080 is already locked
        at org.apache.pulsar.metadata.api.MetadataStoreException.unwrap(MetadataStoreException.java:186) ~
        at org.apache.pulsar.broker.loadbalance.extensions.BrokerRegistryImpl.register(BrokerRegistryImpl.java:131) ~
        at org.apache.pulsar.broker.loadbalance.extensions.BrokerRegistryImpl.start(BrokerRegistryImpl.java:110) ~
        ... 16 more

The "Resource xxx is already locked" is usually caused by a BadVersionException. This issue can happen when:

Broker somehow crashed (e.g. killed by kill -9 or the container crashed) so it did not delete the node from the metadata store.
Broker restarted, but the node still existed so it failed to acquire the lock.
This error can only be recovered by restarting the broker after the metadata store deleted that node (manually by operators or automatically by session timeout detected)

Actually, it's not a good use case of LockManager here. The broker registry only:

Put it's own broker's lookup data under the of /loadbalance/brokers/<broker-id> when starting (in the main thread)
Delete its own broker id from when closing.
Query any broker's lookup data from the /loadbalance/brokers/<broker-id>
List all children nodes of /loadbalance/brokers.

Currently, step 1 and 2 are performed by the distribution lock implemented by LockManager. However, there should be no race condition because the broker id is the unique identifier of a broker. There is no case that a broker starts or closes concurrently. In addition, it's meaningless to pass a version explicitly to the metadata store.

Regarding step 3 and 4, they are simply wrapping the MetadataCache's methods. They are actually never related to something like "lock" though they're named listLocks and readLock.

Modifications

Add a put method to MetadataCache, which just calls MetadataStore#put with Optional.empty() underlying.
Replace LockManager with MetadataCache in BrokerRegistryImpl.

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository:

…on id compare

pulsar-metadata/src/main/java/org/apache/pulsar/metadata/cache/impl/MetadataCacheImpl.java

codelipenghui · 2024-09-12T17:23:46Z

Currently, step 1 and 2 are performed by the distribution lock implemented by LockManager. However, there should be no race condition because the broker id is the unique identifier of a broker. There is no case that a broker starts or closes concurrently. In addition, it's meaningless to pass a version explicitly to the metadata store.

@BewareMyPower But it could happen from user's incorrect configuration that more than 1 brokers use the same broker id especially for a on-prem environment. And the comment #23297 (comment) is also the same concern about the lock issue. It looks like something cause the lock didn't get released even if the broker unregister itself from the metadata server.

lhotari · 2024-09-12T17:59:13Z

@BewareMyPower But it could happen from user's incorrect configuration that more than 1 brokers use the same broker id especially for a on-prem environment.

If multiple brokers have the same broker id, the configuration would have multiple other problems. Wouldn't it be a valid requirement that the admin of the Pulsar system ensures that broker ids are unique?

And the comment #23297 (comment) is also the same concern about the lock issue. It looks like something cause the lock didn't get released even if the broker unregister itself from the metadata server.

Yes, that seems to be a real issue that would need to be addressed in any case.

BewareMyPower · 2024-09-13T01:54:31Z

But it could happen from user's incorrect configuration that more than 1 brokers use the same broker id especially for a on-prem environment.

I agree with Lari that we should not guarantee a certain behavior with wrong configuration. If you assumed invalid configuration could happen, what about two brokers configured different metadata stores? How can you prevent the broker with wrong metadata URL from starting?

BewareMyPower · 2024-09-13T03:12:17Z

From another perspective, this PR resolves the case the broker exited unexpectedly (e.g. by a crash). Just like the incident we met online:

Sep 12, 2024 @ 07:00:10.089	#  SIGSEGV (0xb) at pc=0x00000000000049c6, pid=1, tid=454

The JVM process exited by a SIGSEGV signal and no cleanup was done, even the metadata store client was not closed. Then when the broker restarted, it would fail with LockBusyException. After some time, the metadata store server detected the client's session timeout and then it deleted the node. Then restarting the broker would succeed.

When programming, we should handle exceptions rather than errors.

Unexpected crash of some other unexpected logics that skip the unregister are "exceptions": it can be recovered by ignoring the version compare, like what this PR does.
Misconfiguration at the beginning are "errors": it cannot be recovered until the operators realized this error and updated the correct configuration.

There is another solution that we can add retry logic to BrokerRegistryImpl#register, here is a simple example (without using Backoff or exception type check):

            final var deadline = System.currentTimeMillis() + pulsar.getConfig().getMetadataStoreSessionTimeoutMillis();
            Throwable throwable = null;
            while (true) {
                if (System.currentTimeMillis() > deadline) {
                    throw MetadataStoreException.unwrap(throwable == null ? new TimeoutException("Failed to register")
                            : throwable);
                }
                try {
                    this.brokerLookupDataLock = brokerLookupDataLockManager.acquireLock(keyPath(brokerId), brokerLookupData)
                            .get(conf.getMetadataStoreOperationTimeoutSeconds(), TimeUnit.SECONDS);
                    this.state = State.Registered;
                    break;
                } catch (InterruptedException | ExecutionException | TimeoutException e) {
                    throwable = e;
                    try {
                        Thread.sleep(100);
                    } catch (InterruptedException ex) {
                        throw MetadataStoreException.unwrap(ex);
                    }
                }
            }

BewareMyPower · 2024-09-13T03:55:00Z

I just thought it again. Retry is not a good solution. If the broker crashed, the registered data could still exist in the metadata store so that this "dead" broker could be selected as the owner. However, even after it restarted, it could not serve any request until the metadata store session timeout. So many topics could be unavailable for a long time.

Therefore, it does not make sense to keep the current flaky logic just to fail fast for a misconfiguration case. @codelipenghui

...broker/src/main/java/org/apache/pulsar/broker/loadbalance/extensions/BrokerRegistryImpl.java

codelipenghui · 2024-09-13T06:55:10Z

I'm not going to say we should handle the incorrect configuration from the user side, but what is the expected behavior if user made an incorrect broker ID (same broker ID)? As I understand, only the first broker should register successfully, all the other brokers with the same broker ID should get failed?

pulsar-metadata/src/main/java/org/apache/pulsar/metadata/api/MetadataCache.java

lhotari

LGTM

lhotari · 2024-09-13T07:09:22Z

I'm not going to say we should handle the incorrect configuration from the user side, but what is the expected behavior if user made an incorrect broker ID (same broker ID)? As I understand, only the first broker should register successfully, all the other brokers with the same broker ID should get failed?

In this case, one of the benefits of this change is that in the case of broker crash, that the broker can recover immediately. The session of the crashed instance will block starting of a new instance if we don't make this change. I think that it's a different matter to detect if someone has duplicate broker ids in their Pulsar cluster configuration.

@BewareMyPower Perhaps the description could be updated to include the actual reason why we want to do this change, so that a crashed instance can recover immediately, without needing to wait until the session of the crashed instance expires?

Demogorgon314

Do we need to apply the same change for ModularLoadManager?

BewareMyPower · 2024-09-13T08:04:30Z

Perhaps the description could be updated to include the actual reason why we want to do this change,

@lhotari Updated.

but what is the expected behavior if user made an incorrect broker ID (same broker ID)? As I understand, only the first broker should register successfully, all the other brokers with the same broker ID should get failed?

@codelipenghui I'd like to say, undefined behavior. Kafka's behavior is also that only one broker will succeed if multiple brokers configure the same broker.id. However, this behavior is never documented. It only suggests generating broker id automatically by ZooKeeper:

To avoid conflicts between ZooKeeper generated broker id's and user configured broker id's, generated broker ids start from reserved.broker.max.id + 1.

Actually, in real world, only if two brokers were configured with the same advertisedAddress and webServicePort would they have the same broker id. Even if they both don't fail after the change of this PR, users can still easily found the service didn't work because the advertised address is wrong.

It would be good if users can quickly know they've configured the same broker id for different brokers. However, IMO, this issue caused by misconfiguration can hardly exist in practical rather than in theory.

BewareMyPower · 2024-09-13T08:10:55Z

Do we need to apply the same change for ModularLoadManager?

@Demogorgon314 +1 to me

BewareMyPower · 2024-09-13T09:17:39Z

I swallowed the exception in BrokerRegistryImpl#unregister().

BTW, I think all synchronous close operations for components should not throw any exception. Otherwise, the following close operations will be skipped. Just add error logs for unexpected exceptions and warning logs for expected exceptions.

codecov-commenter · 2024-09-13T10:37:52Z

Codecov Report

Attention: Patch coverage is 62.50000% with 9 lines in your changes missing coverage. Please review.

Project coverage is 74.56%. Comparing base (bbc6224) to head (acf0321).
Report is 579 commits behind head on master.

Files with missing lines	Patch %	Lines
.../pulsar/metadata/cache/impl/MetadataCacheImpl.java	40.00%	4 Missing and 2 partials ⚠️
...ker/loadbalance/extensions/BrokerRegistryImpl.java	78.57%	2 Missing and 1 partial ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##             master   #23298      +/-   ##
============================================
+ Coverage     73.57%   74.56%   +0.99%     
- Complexity    32624    34308    +1684     
============================================
  Files          1877     1927      +50     
  Lines        139502   145040    +5538     
  Branches      15299    15860     +561     
============================================
+ Hits         102638   108150    +5512     
+ Misses        28908    28617     -291     
- Partials       7956     8273     +317

Flag	Coverage Δ
inttests	`27.94% <41.66%> (+3.36%)`	⬆️
systests	`24.69% <4.16%> (+0.37%)`	⬆️
unittests	`73.90% <62.50%> (+1.05%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...ker/loadbalance/extensions/BrokerRegistryImpl.java	`82.07% <78.57%> (-2.69%)`	⬇️
.../pulsar/metadata/cache/impl/MetadataCacheImpl.java	`85.49% <40.00%> (-3.77%)`	⬇️

... and 567 files with indirect coverage changes

…on id compare (#23298) (cherry picked from commit 13c19b5)

[improve][broker] Register the broker to metadata store without versi…

2b10cda

…on id compare

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Sep 12, 2024

BewareMyPower requested review from merlimat, lhotari, hangc0276, codelipenghui, Demogorgon314 and heesung-sn September 12, 2024 15:44

BewareMyPower self-assigned this Sep 12, 2024

BewareMyPower added the type/bug The PR fixed a bug or issue reported a bug label Sep 12, 2024

BewareMyPower added this to the 4.0.0 milestone Sep 12, 2024

BewareMyPower added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages and removed type/bug The PR fixed a bug or issue reported a bug labels Sep 12, 2024

lhotari reviewed Sep 12, 2024

View reviewed changes

pulsar-metadata/src/main/java/org/apache/pulsar/metadata/cache/impl/MetadataCacheImpl.java Outdated Show resolved Hide resolved

Refresh the cache after put succeeds

f9c25bc

lhotari reviewed Sep 13, 2024

View reviewed changes

...broker/src/main/java/org/apache/pulsar/broker/loadbalance/extensions/BrokerRegistryImpl.java Outdated Show resolved Hide resolved

BewareMyPower marked this pull request as draft September 13, 2024 04:05

Add CreateOption parameter to the put API

1d9663b

BewareMyPower marked this pull request as ready for review September 13, 2024 06:34

lhotari reviewed Sep 13, 2024

View reviewed changes

pulsar-metadata/src/main/java/org/apache/pulsar/metadata/api/MetadataCache.java Show resolved Hide resolved

lhotari approved these changes Sep 13, 2024

View reviewed changes

Demogorgon314 approved these changes Sep 13, 2024

View reviewed changes

BewareMyPower marked this pull request as draft September 13, 2024 08:52

Swallow exceptions in BrokerRegistryImpl#unregister

7130654

BewareMyPower marked this pull request as ready for review September 13, 2024 09:15

Fix checkstyle

acf0321

BewareMyPower merged commit 13c19b5 into apache:master Sep 13, 2024
51 checks passed

BewareMyPower deleted the bewaremypower/broker-registry-improve branch September 13, 2024 11:07

shibd requested a review from poorbarcode September 18, 2024 02:35

Demogorgon314 added release/3.3.1 release/3.3.2 labels Sep 20, 2024

BewareMyPower added a commit that referenced this pull request Sep 20, 2024

[improve][broker] Register the broker to metadata store without versi…

b05d0d1

…on id compare (#23298) (cherry picked from commit 13c19b5)

BewareMyPower added the cherry-picked/branch-3.3 label Sep 20, 2024

BewareMyPower mentioned this pull request Sep 26, 2024

[fix][broker] Fix the broker registery cannot recover from the metadata node deletion #23359

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[improve][broker] Register the broker to metadata store without version id compare #23298

[improve][broker] Register the broker to metadata store without version id compare #23298

BewareMyPower commented Sep 12, 2024 •

edited

Loading

codelipenghui commented Sep 12, 2024

lhotari commented Sep 12, 2024

BewareMyPower commented Sep 13, 2024

BewareMyPower commented Sep 13, 2024 •

edited

Loading

BewareMyPower commented Sep 13, 2024

codelipenghui commented Sep 13, 2024

lhotari left a comment

lhotari commented Sep 13, 2024 •

edited

Loading

Demogorgon314 left a comment

BewareMyPower commented Sep 13, 2024

BewareMyPower commented Sep 13, 2024

BewareMyPower commented Sep 13, 2024

codecov-commenter commented Sep 13, 2024 •

edited

Loading

[improve][broker] Register the broker to metadata store without version id compare #23298

[improve][broker] Register the broker to metadata store without version id compare #23298

Conversation

BewareMyPower commented Sep 12, 2024 • edited Loading

Motivation

Modifications

Documentation

Matching PR in forked repository

codelipenghui commented Sep 12, 2024

lhotari commented Sep 12, 2024

BewareMyPower commented Sep 13, 2024

BewareMyPower commented Sep 13, 2024 • edited Loading

BewareMyPower commented Sep 13, 2024

codelipenghui commented Sep 13, 2024

lhotari left a comment

Choose a reason for hiding this comment

lhotari commented Sep 13, 2024 • edited Loading

Demogorgon314 left a comment

Choose a reason for hiding this comment

BewareMyPower commented Sep 13, 2024

BewareMyPower commented Sep 13, 2024

BewareMyPower commented Sep 13, 2024

codecov-commenter commented Sep 13, 2024 • edited Loading

Codecov Report

BewareMyPower commented Sep 12, 2024 •

edited

Loading

BewareMyPower commented Sep 13, 2024 •

edited

Loading

lhotari commented Sep 13, 2024 •

edited

Loading

codecov-commenter commented Sep 13, 2024 •

edited

Loading