Playing with the producer name as a way of adjusting message deduplication? #23529
Replies: 2 comments 8 replies
-
I don't think that global deduplication across partitions would make much sense with Pulsar. Perhaps there's another way to solve your use case. Please describe your use case.
Are you routing each message on the producer side to a single partition by the message key (this is enabled by default when messages contain keys)? routing to a single partition by message key would most likely solve your deduplication problem in the first place, as long as you have other conditions of deduplication fulfilled. Please check the docs about message deduplication. |
Beta Was this translation helpful? Give feedback.
-
I have streams of messages that belong to the same topic but are categorized into groups by what I call a "partition key", meaning that messages within each "logical partition" should have their own set of sequence IDs. I need to somehow express this in Pulsar, I figured using the producer name for that purpose would do the trick.
Yes, but that's not sufficient, because the number of physical partitions in a given Pulsar topic might not (and in fact often does My problem would've been solved if Pulsar tracked sequence IDs per message key; but it doesn't. But I can simulate that functionality by using these partition/group keys in the producer name; what would be the problem with that? Thanks. |
Beta Was this translation helpful? Give feedback.
-
Hey there; it seems like message dedupliation doesn't work across the partitions of a given topic (meaning that topic-partition has its own <producer name, sequence ID> tuple); this is undesirable in most cases, I would say.
If the messages that your publisher publishes have partition keys, would it be a sound approach to use the partition key as part of the producer name, so as to effectively give each "logical" partition its own sequence ID?
Beta Was this translation helpful? Give feedback.
All reactions