-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata information is not cleaned when broker exits abnormally #23889
Comments
The metadata should get deleted automatically when the session expires. When a broker crashes, a new broker instance would not be able to start until the session expires and the empheral metadata entries expire. There's Line 142 in 66d1bb0
What metadata store implementation are you using? (What version, deployment type?) |
@lhotari Thank you for your clear and accurate response; I found it very helpful. I will try the latest version of Oxia latter. In this case, I am using Pulsar version 4.0.2, with ZooKeeper as the metadata storage service. I have identified the critical issue: when a broker registers information with the metadata storage service, it sets the
The verification method is as follows: @Test
public void zookeeperEphemeralKeys() throws Exception {
final String key1 = newKey();
final String key2 = newKey();
@Cleanup MetadataStoreExtended store = MetadataStoreExtended.create(zks.getConnectionString(), MetadataStoreConfig.builder().build());
store.put(key1, "value-1".getBytes(), Optional.of(-1L), EnumSet.of(CreateOption.Ephemeral)).join();
store.put(key2, "value-1".getBytes(), Optional.empty(), EnumSet.of(CreateOption.Ephemeral)).join();
store.close();
@Cleanup MetadataStoreExtended store2 = MetadataStoreExtended.create(zks.getConnectionString(), MetadataStoreConfig.builder().build());
assertFalse(store2.exists(key1).join());
// This check will not pass
assertFalse(store2.exists(key2).join());
store2.close();
} I have made the following modifications. |
Good observations, @Joforde. Thanks for sharing. Are you using the extensible load manager? It seems that the Pulsar default load manager implementation ( Line 978 in 325c6a5
What |
@Joforde I guess the intention of #23298 change was to overwrite any existing node and create a new emphemeral node if one doesn't exist. Adding tests and support for all metadata implementations would be necessary in addressing the issue, it seems. |
The loadManagerClassName I have set is org.apache.pulsar.broker.loadbalance.extensions.ExtensibleLoadManagerImpl. |
This is anti-intuitive IMO. The |
Don't we need to pass opPut.getOptions() here so that it can pass the original opPut.getOptions()(CreateOption.Ephemeral) when the set operation fails? put(opPut.getPath(), opPut.getData(), Optional.of(-1L), opPut.getOptions()).thenAccept( |
Good catch @heesung-sn! Would you like to submit a PR for addressing this bug? |
raised a pr : #23902 |
@Joforde |
Search before asking
Motivation
When a broker starts, it registers its metadata with the metadata service (such as Zookeeper or ETCD) under the
/loadbalance/brokers
directory. When the broker exits gracefully, it actively calls the unregister method to remove its own metadata.pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/extensions/BrokerRegistryImpl.java
Lines 134 to 143 in 66d1bb0
pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/extensions/BrokerRegistryImpl.java
Lines 172 to 178 in 66d1bb0
However, if the broker is forced to exit due to issues like hardware failure, network problems, or being terminated with
kill -9
, it does not call the unregister method to delete its metadata. This results in the metadata for brokers that are no longer accessible remaining in the system.Currently, Pulsar retrieves all active brokers by fetching all child nodes under the metadata service's
/loadbalance/brokers
path. This can lead to offline brokers being considered active, and bundles may be assigned to brokers that are not accessible, causing new namespace clients to experience read and write failures.pulsar/pulsar-broker/src/main/java/org/apache/pulsar/broker/loadbalance/extensions/BrokerRegistryImpl.java
Lines 197 to 201 in 66d1bb0
Solution
I am not yet certain how to resolve this issue. I am currently reviewing the relevant code and drafting a solution document. If you have any suggestions, please feel free to leave a comment.
Alternatives
No response
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: