Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RabbitMQ] Messages are sometimes not getting ACKed #202

Open
msrb opened this issue Jul 24, 2020 · 2 comments
Open

[RabbitMQ] Messages are sometimes not getting ACKed #202

msrb opened this issue Jul 24, 2020 · 2 comments

Comments

@msrb
Copy link
Member

msrb commented Jul 24, 2020

It looks like messages are sometimes not getting acked when using RabbitMQ provider. There are suspicious error messages in Jenkins log:

Jul 24, 2020 8:26:52 AM WARNING com.redhat.jenkins.plugins.ci.messaging.RabbitMQMessagingWorker receive

JMS exception raised, going to re-subscribe for job 'RabbitMQ_Copr_Heartbeat'.
com.rabbitmq.client.AlreadyClosedException: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - unknown delivery tag 147816, class-id=60, method-id=80)
	at com.rabbitmq.client.impl.AMQChannel.ensureIsOpen(AMQChannel.java:258)
	at com.rabbitmq.client.impl.AMQChannel.transmit(AMQChannel.java:427)
	at com.rabbitmq.client.impl.AMQChannel.transmit(AMQChannel.java:421)
	at com.rabbitmq.client.impl.recovery.RecoveryAwareChannelN.basicAck(RecoveryAwareChannelN.java:93)
	at com.rabbitmq.client.impl.recovery.AutorecoveringChannel.basicAck(AutorecoveringChannel.java:428)
	at com.redhat.jenkins.plugins.ci.messaging.RabbitMQMessagingWorker.receive(RabbitMQMessagingWorker.java:181)
	at com.redhat.jenkins.plugins.ci.threads.CITriggerThread.run(CITriggerThread.java:73)

Jul 24, 2020 8:26:52 AM WARNING com.redhat.jenkins.plugins.ci.messaging.RabbitMQMessagingWorker unsubscribe

Exception occurred when closing channel: channel is already closed due to channel error; protocol method: #method<channel.close>(reply-code=406, reply-text=PRECONDITION_FAILED - unknown delivery tag 147816, class-id=60, method-id=80)

Jul 24, 2020 8:26:52 AM INFO com.redhat.jenkins.plugins.ci.messaging.RabbitMQMessagingWorker subscribe

Subscribing job 'RabbitMQ_Copr_Heartbeat' to org.fedoraproject.prod.copr.build.start topic.

Jul 24, 2020 8:26:52 AM INFO com.redhat.jenkins.plugins.ci.messaging.RabbitMQMessagingWorker subscribe

Successfully subscribed job 'RabbitMQ_Copr_Heartbeat' to topic 'org.fedoraproject.prod.copr.build.start'.

This happens on 2 different Jenkins instances. The first is running in CentOS infrastructure, on OpenShift, and the second is deployed on AWS EKS.

Note Jenkins seems to be triggering builds on messages just fine. The problem is that messages that don't get acked will be re-delivered by broker after some time and Jenkins will thus trigger new builds again.

This problem was discovered by Fedora monitoring: https://pagure.io/fedora-ci/general/issue/125.

@pypingou
Copy link

pypingou commented Jul 30, 2020

Considering https://www.mariuszwojcik.com/how-to-choose-prefetch-count-value-for-rabbitmq/ , fedora-infra/fedora-messaging#224 and the fact that the prefetch value is set to 0, I now wonder if the issue we've seen in the ticket above could not be linked to that configuration. Maybe setting the prefetch value to something other than 0 would mitigate this

@jeremycline
Copy link

Considering https://www.mariuszwojcik.com/how-to-choose-prefetch-count-value-for-rabbitmq/ , fedora-infra/fedora-messaging#224 and the fact that the prefetch value is set to 0, I know wonder if the issue we've seen in the ticket above could not be linked to that configuration. Maybe setting the prefetch value to something other than 0 would mitigate this

I don't see how they are related. The error is pretty clear, the channel is closed because the delivery tag (147816) is unknown. Delivery tags are how acks/nacks identify the message they are acking/nacking, are server-assigned, and are scoped to a channel. This, at first glance, looks to be a client bug. For example, if it's receiving the message on one channel and sending an ack for that message using a different channel, that would result in this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants