-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ReactorNettyClient stucked on cancelled Conversation if that Conversation has more than 256 rows (size of reactor.bufferSize.small) #661
Comments
Thanks a lot for chasing this issue down. Since you invested about 80% of the effort that is required to fix the issue, do you want to submit a pull request to clear out the cancelled conversations? |
I've never worked before with reactor library (mono, flux). But I found that it's not easy to track down what is the source of cancellation - error in parallel zip function, ordinal cancel or cancellation from Mono.from(fluxPublisher).
will fire println with first emit
no println "fire" |
@alexeykurshakov these cancellations have reasonable explanations. A couple examples:
For inspiration regarding test cases, perhaps you can use my examples with mocks. This was part of the investigation whether the r2dbc-pool is responsible for the connection leaks in r2dbc/r2dbc-pool#198 (comment). |
Why you discard cancellation with Operators.discardOnCancel and what .doOnDiscard(ReferenceCounted.class, ReferenceCountUtil::release) should do? |
|
Sounds like it should works, but not 🤣. According to an issue example badThread never consumed data and sending cancel signal after real data feed ReactorNettyClient that leads to the case when it saved this messages in internal buffer. So in that example discard happened too late. |
I can provide a timeline of what happened. And then we'll figure out how to fix it. |
Hello! We've been hit by similar issue this past week during some load testing. I have attached a stacktrace. We also saw a few Netty LEAK errors stacktrace.
|
@alexeykurshakov Have you managed to conduct further investigation? |
@travispeloton Could you provide more details about your testing environment? Because I don't clear understand what do you mean by "It usually only happens on one server instance". |
@agorbachenko unfortunately no. we did a short workaround temporarily in project and moving to jooq instead. |
@alexeykurshakov we haven't see the issue again For "It usually only happens on one server instance", we run multiple k8s pods, so it was observed on a single pod in the 3 different times we saw it. "not only a single query" - in my case there is one type of query that currently dominates traffic |
Bug Report
Versions
Current Behavior
When you have query zipped in parallel with some other failed function and that query return more than 256 rows it can leads to the case when you no have real consumer, because chain was cancelled, but you receive data from database that start to save it to ReactorNettyClinet.buffer.
When this happens, any other attempts to get data from the database will fail because ReactorNettyClient.BackendMessageSubscriber.tryDrainLoop never call drainLoop because stucked conversation no have demands
Can reproduce using https://github.com/agorbachenko/r2dbc-connection-leak-demo
If you increase System property "reactor.bufferSize.small" to 350, the attached example will start working
The text was updated successfully, but these errors were encountered: