Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reject-publish ignored by Dead-Letter quorum queues #8495

Open
leadenmoth opened this issue Jun 7, 2023 · 6 comments
Open

reject-publish ignored by Dead-Letter quorum queues #8495

leadenmoth opened this issue Jun 7, 2023 · 6 comments
Labels

Comments

@leadenmoth
Copy link

leadenmoth commented Jun 7, 2023

Describe the bug

RabbitMQ 3.11.11, Erlang 25.3, deployed in AKS via cluster-operator 2.2.0

Setting max-length: <int> and overflow: reject-publish policy on a quorum dead letter queue leads to queue continuously filling beyond max-length (I stopped the test at 25000 messages with max-length:10). According to the docs,

When a quorum queue reaches the max-length limit and reject-publish is configured it notifies each publishing channel who from thereon will reject all messages back to the client.

My understanding is that pushing a dead message from main queue to dead letter exchange and then to bound dead letter queue is done entirely on the server side?.. If so, my best guess is that whatever does that publishing (some internal publisher?..) ignores the reject-publish notification. Setting drop-head (well, unsetting reject-publish and letting it default to drop-head) instead works perfectly.

Reproduction steps

  1. Set up normal and DL exchanges, set up normal and DL queues of type quorum, set queue-exchange bindings, set x-dead-letter-exchange, x-dead-letter-routing-key and x-delivery-limit on normal queue
  2. Set up a max-length and reject-publish policy on the DL queue
  3. Publish significantly more messages than max-length to normal queue
  4. Consume and reject messages from normal queue continuously until x-delivery-limit is reached and the messages start moving to DL queue
  5. Observe that all messages are moved to DL queue, regardless of max-length
    ...

Expected behavior

Messages to-be-dead-lettered above the max-length limit should be discarded when reject-publish is set

Additional context

Context: we have a legacy system prone to causing message pile-ups and going above the high RAM watermark for good (less so after 3.8->3.11 upgrade, due to all the improvements to quorum behavior). Most queues are quorum and have DL counterparts (also quorum) with a single dead letter exchange and routing-key -> binding routing. Message pileups eventually end up in corresponding dead letter queues, still occupying RAM (or HDD, since the 3.11 upgrade). In an effort to free up resources I was trying to limit the size of DL queues. Since there's no process for dealing with DL messages at all (again, legacy system, someone decided having DL would be good, but never got around to using it), the knee-jerk reaction would be to just kill the DL queues. I thought I'd instead keep them for human review with just a few messages, but, crucially, the earliest messages that failed. Sure, failure could be logged elsewhere (and it is), but I thought having all the RabbitMQ fields available for investigation would be nice. My worry was that reject-publish on a DL queue would cause some weird requeueing loop. Instead, it's just ignored.

@leadenmoth leadenmoth added the bug label Jun 7, 2023
@ansd
Copy link
Member

ansd commented Jun 7, 2023

At-least-once dead lettering will respect reject-publish of the dead letter target quorum queue.
However, you'll need to take care to not cause message pileup in the source quorum queue. (In your case the messages already pile up in the source quorum queue.)

In an effort to free up resources I was trying to limit the size of DL queues.

Instead of dead lettering to a quorum queue, could you dead letter to a Stream (possible since #7846)?

@leadenmoth
Copy link
Author

leadenmoth commented Jun 7, 2023

I don't think anywhere in the documentation it says that at-most-once dead lettering does not respect reject-publish. In fact, the at-least-once doc you linked mentions reject-publish as one of the reasons at-most-once may fail to dead-letter a message

Re: streams, I won't be able to justify the extra development work for our almost-non-existent dead lettering use case, but I'll look into it when we begin overhauling this part of the system. Basically, I'm the only one fighting to keep any dead lettering in the current implementation :P

@michaelklishin
Copy link
Member

Dead lettering does not really follow most of the code paths related to publishing, so this can be one of the examples where it's a mere oversight.

@ansd
Copy link
Member

ansd commented Jun 7, 2023

I added your feature request to #8261 (comment)

I don't think anywhere in the documentation it says that at-most-once dead lettering does not respect reject-publish.

You're right, we should document that specific limitation.

In fact, the at-least-once doc you linked mentions reject-publish as one of the reasons at-most-once may fail to dead-letter a message

Yes, good point 🙂 Although this still applies to dead letter target classic queues.

I'm just trying to help you with an immediate workaround. You could

  • Switch your target dead letter quorum queue overflow behaviour to drop-head, or
  • Use a classic queue as target dead letter queue with overflow behaviour either drop-head or reject-publish, or
  • Use a Stream as target dead letter queue (configuring x-max-age or x-max-length-bytes).

@leadenmoth
Copy link
Author

leadenmoth commented Jun 7, 2023

Thanks @ansd - I was going to go with drop-head and leave it at that for the current increment. For our use case it's a nice-to-have, but since it seemed like a bug or a gap in docs, I decided to report it. Depending on how long it takes until we start redesigning the system, I might switch DL to classic and try again - those queues, the way we use them, don't need to be quorum, but it's also a sizeable piece of work to just recreate them as classic due to how our code is layered and per-environment change management happens, and 3.11 upgrade reduced the pain points so that it's harder to justify further changes :D

@ansd
Copy link
Member

ansd commented Jun 7, 2023

Many thanks for reporting this issue @leadenmoth!
To be honest, until you opened this issue, I wasn't aware that quorum queues behave like that.

@rabbitmq rabbitmq deleted a comment from amills-vibeirl Jan 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants