Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Channel cache leak when no answers from broker for pending confirms #2640

Closed
artembilan opened this issue Feb 28, 2024 Discussed in #2637 · 0 comments
Closed

Channel cache leak when no answers from broker for pending confirms #2640

artembilan opened this issue Feb 28, 2024 Discussed in #2637 · 0 comments

Comments

@artembilan
Copy link
Member

Discussed in #2637

Originally posted by arn-rio February 26, 2024
Hello,

We experienced some issues using correlated Rabbit confirm publishers.
Rabbit server was unstable for a while. Once restored, we were unable to publish new confirmed messages to it (the max number of channel on connection was reached and the existing channels were ignored).
The solution was to restart the service (we could force close the connection : but it only worked when channel cache size wasn't set in CachingConnectionFactory)

We investigated and found that :

  • when a message is published but not confirmed, assigned channel is not used for an other message. (see spring amqp documentation )
  • so CachingConnectionFactory creates new channels, and then number of channel in connection increases.
  • Finally, there is no channel available on the connection : max number of channels reached
    when a new channelCacheSize and channelCheckoutTimeout are set in CachingConnectionFactory, we got 'No available channels' error, otherwise, 'The channelMax limit is reached'.

We call RabbitTemplate.getUnconfirmed() in a periodic task : the correlation data are removed, but unfortunatly, the related channel is not freed and not available for new publish.

You can observe and reproduce the issue as follows :

  • a periodic task that publishes a confirmed message on Rabbit every 20 ms
  • a cron job that calls RabbitTemplate.getUnconfirmed()
  • on a local Rabbit server, block then unblock connection by updating memory watermark
	//block connections 
	rabbitmqctl set_vm_memory_high_watermark <low threshold>
	//unblock
	rabbitmqctl set_vm_memory_high_watermark <high threshold>

Is there a better way to manage not confirmed messages? How can we free the channels when confirmation is not received before a timeout?

Thanks

@artembilan artembilan added this to the 3.1.3 milestone Feb 28, 2024
spring-builds pushed a commit that referenced this issue Feb 28, 2024
Fixes: #2640

Rabbit server was unstable for a while.
Once restored, we were unable to publish new confirmed messages to it
(the max number of channel on connection was reached and the existing channels were ignored).
Essentially `PublisherCallbackChannel` instances ara waiting for acks on their confirms
which never going to happen.
Therefore, these channels are not closed and cache state is not reset.

* Fix `CachingConnectionFactory.CachedChannelInvocationHandler.returnToCache()`
to schedule `waitForConfirms()` in the separate thread.
If `TimeoutException` happens, perform `physicalClose()` to avoid any possible
memory leaks
* Adjust `RabbitTemplatePublisherCallbacksIntegrationTests.testPublisherConfirmNotReceived()`
to ensure that "unconfirmed" channel is closed and `CachingConnectionFactory` can produce
a new channel

(cherry picked from commit 9a8d741)
artembilan added a commit that referenced this issue Mar 13, 2024
The `publisherCallbackChannel.waitForConfirms()` may fail with a `ShutdownSignalException`
indicating that channel is closed on the broker.
So, treat it as a `TimeoutException` and proceed with a `physicalClose()` logic

**Auto-cherry-pick to `3.0.x`**

Related to #2640
spring-builds pushed a commit that referenced this issue Mar 13, 2024
The `publisherCallbackChannel.waitForConfirms()` may fail with a `ShutdownSignalException`
indicating that channel is closed on the broker.
So, treat it as a `TimeoutException` and proceed with a `physicalClose()` logic

Related to #2640

(cherry picked from commit 22d22b9)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants