-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PIP 79][common,broker,client] Change PartitionedTopicStats to support partial partitioned producer #10534
Conversation
@equanz Could you elaborate on what the problem is? If each partitioned producer creates producers on a subset of partitions, how do the stats become incorrect? |
@Vanlightly Of course. I think the main issue is stats collecting logic. pulsar/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/stats/TopicStatsImpl.java Lines 184 to 193 in 3dd9ec5
It causes these issues.
|
@equanz Let me know if I understand correctly. We publish aggregated stats, where we aggregate by partitioned producer, and optionally we also publish stats per partition with no aggregation. So the issue is that currently the stats collection code expects each partition to have the same number of producers on each partition, and uses that as a way of aggregating the stats for each partitioned producer. So we need another way of aggregating by partitioned producer. Moreover, it looks to me like the current strategy is already broken seeing as the numbers won't always match (neither numbers nor order) such as when producers come and go. Using producer name as an aggregation key is not good because if the producer name is not set by the user, then each internal producer will have a different producer name (generated by the broker). So the fix is to add another producer identity (producerStatsKey) that is shared across all producers of the same partitioned producer. Then aggregate on that. The producerStatsKey is set by the broker, to ensure it is globally unique. The partitioned producer creates a single producer when the partitioned producer is started. Then when additional internal producers for other partitions are required, they are created with the same producerStatsKey as the first producer. That way they all share the same key. Is that summary correct? Additional question:
|
I think so too. Therefore, not only partial producer stats, but also total producer stats will be calculated correctly by this feature.
Yes.
We can choose the strategy described above, but I think it will break some existing behavior(currently, internal producer's name aren't same. / if we use the same producerName in the partitioned topic, then they will be aggregated.). If the change described above is approved in the community, I'll change the strategy. |
I'll try to implement this feature based on above comment (use the producerName for aggregation and use the same strategy of starting a single producer). |
Close this PR because #12401 has been merged. |
Master Issue: https://github.com/apache/pulsar/wiki/PIP-79%3A-Reduce-redundant-producers-from-partitioned-producer
Motivation
Please see the PIP document.
This is a part of implementations. Before review it, please check #10279. After #10279 is merged, I will rebase to top of master branch.
Modifications
For backward compatibility reason, I introduce new producer property
producerStatsKey
. If this property is same, then associate publisher stats as same producer at partitioned producer stats.I think about using
producerName
instead of introducingproducerStatsKey
, but I think we should not use it. Because currentlyproducerName
is configured by the system(different for each partition) or yourself(same between all partitions). Therefore, if we useproducerName
as association key, we should change system behavior (I think this is breaking changes) or request to user that "you should setproducerName
yourself if you want to associate publisher stats per partitioned producer".If any suggestions, please comment to this PR.
Verifying this change
This change added tests and can be verified as follows:
Does this pull request potentially affect one of the following parts:
If
yes
was chosen, please highlight the changespulsar/pulsar-common/src/main/java/org/apache/pulsar/common/policies/data/PublisherStats.java
Line 49 in c0a4845
pulsar/pulsar-common/src/main/proto/PulsarApi.proto
Line 259 in c0a4845
pulsar/pulsar-common/src/main/proto/PulsarApi.proto
Line 486 in c0a4845
pulsar/pulsar-common/src/main/proto/PulsarApi.proto
Line 636 in c0a4845
Documentation