-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix][client] Orphan producer when concurrently calling producer closing and reconnection #23853
[fix][client] Orphan producer when concurrently calling producer closing and reconnection #23853
Conversation
The PR title is currently very confusing when it's "Fix closed producers were reverted mistakenly". |
Sorry, it should be I corrected the title |
/pulsarbot rerun-failure-checks |
The title remains hard to understand. I used the technique described here to let an LLM (Claude) suggest a title. The suggestion based on the PR context is "Fix race condition between producer reconnection and closing that causes orphaned producers". Here's the full example of what LLM suggested: https://gist.github.com/lhotari/e63521b8a5694c5a928f740cd5d46331 . |
@poorbarcode Please also make updates in the description where it mentions the very misleading sentence "closed producers were reverted mistakenly". |
Modified |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/LGTM
Nice catch!
pulsar-client/src/test/java/org/apache/pulsar/client/impl/ProducerImplTest.java
Show resolved
Hide resolved
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ConnectionHandler.java
Show resolved
Hide resolved
ProducerCloseTest.testProducerCloseCallback fails. Is that flakiness or a real failure? |
ProducerCloseTest.testProducerCloseCallback failed again. @poorbarcode are you able to fix the issue? |
pulsar-client/src/main/java/org/apache/pulsar/client/impl/ProducerImpl.java
Show resolved
Hide resolved
Fixed, I forget to revert the changes that #23761 changed |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #23853 +/- ##
============================================
+ Coverage 73.57% 74.18% +0.61%
+ Complexity 32624 32206 -418
============================================
Files 1877 1853 -24
Lines 139502 143601 +4099
Branches 15299 16307 +1008
============================================
+ Hits 102638 106533 +3895
+ Misses 28908 28663 -245
- Partials 7956 8405 +449
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good work @poorbarcode
…ing and reconnection (apache#23853)
…ing and reconnection (apache#23853) (cherry picked from commit 56adefa) (cherry picked from commit aea4900)
…ing and reconnection (apache#23853) (cherry picked from commit 56adefa) (cherry picked from commit aea4900)
Motivation
background 1: producer's reconnection[1]
null
closing | closed
connecting
background 2: steps of producer closing[2]
connection
is null:closed
connection
is present:closed
Issue 1: resending messages encountered a recycled pending message
reconnection
close producer
null
closing or closed
closing
null
nowconnecting
close
Issue 2: closed producers were set to
connecting
mistakenly: the steps to reproduce the issue are as followsreconnection
close producer
null
closing or closed
closing
null
nowconnecting
close
You can reproduce the issue by
testConcurrencyReconnectAndClose
logs that encountered the issue 2
Modifications
state
atomically when callingreconnection
, see https://github.com/apache/pulsar/compare/master...poorbarcode:fix/producer_race_condition?expand=1#diff-bcd53f63180847515f1fe1d5b00deb218d023cbfe9cbfade19b44c2babd734ffR194-L196producer. pendingMessages
fails all pending sends when producer.connection is null
to reconnect successfully, which was introduced by [Fix][Client] Fix pending message not complete when closeAsync #23761Documentation
doc
doc-required
doc-not-needed
doc-complete
Matching PR in forked repository
PR in forked repository: x