Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prometheus metric for messages dropped by MQTT QoS 0 Queue #9080

Merged
merged 2 commits into from
Aug 16, 2023

Conversation

ansd
Copy link
Member

@ansd ansd commented Aug 15, 2023

Why:

A RabbitMQ operator should be able to see whether RabbitMQ drops MQTT
QoS 0 messages due to overload protection. It's an indication that an
MQTT subscriber does not consume fast enough.

How:

Use Prometheus global counters.

There are 2 valid solutions:

  1. Introduce a new metric called messages_dropped specifically for the
    rabbitmq_mqtt_qos0_queue type. This would work in a similar fashion
    how streams extends the per protocol global counters, but requires
    extending the per protocol & queue type global counters for the MQTT
    QoS queue type. The emitted metrics would look as follows:
# TYPE rabbitmq_global_messages_dropped_total counter
# HELP rabbitmq_global_messages_dropped_total Total number of messages dropped due to mqtt.mailbox_soft_limit exceeded
rabbitmq_global_messages_dropped_total{protocol="mqtt310",queue_type="rabbit_mqtt_qos0_queue"} 0
rabbitmq_global_messages_dropped_total{protocol="mqtt311",queue_type="rabbit_mqtt_qos0_queue"} 0
rabbitmq_global_messages_dropped_total{protocol="mqtt50",queue_type="rabbit_mqtt_qos0_queue"} 0
  1. Reuse the existing metric rabbitmq_global_messages_dead_lettered_maxlen_total

This commit decides to go for the 2nd approach because:
a) there is no need to add a new metric. Even though dead lettering is not supported
for the MQTT QoS 0 queue type, this metric maps nicely to
what happens: The queue drop messages since itx max length
(mqtt.mailbox_soft_limit) is exceeded with overflow behaviour
drop-head. Furtheremore the label dead_letter_strategy="disabled" tells
that dead lettering is not taking place from this queue type.

b) this metric allows to support dead lettering for the MQTT QoS 0 queue
type in the future.

The new dead lettering metrics look as follows:

# TYPE rabbitmq_global_messages_dead_lettered_maxlen_total counter
# HELP rabbitmq_global_messages_dead_lettered_maxlen_total Total number of messages dead-lettered due to overflow drop-head or reject-publish-dlx
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_mqtt_qos0_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0

# TYPE rabbitmq_global_messages_dead_lettered_expired_total counter
# HELP rabbitmq_global_messages_dead_lettered_expired_total Total number of messages dead-lettered due to message TTL exceeded
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0

# TYPE rabbitmq_global_messages_dead_lettered_rejected_total counter
# HELP rabbitmq_global_messages_dead_lettered_rejected_total Total number of messages dead-lettered due to basic.reject or basic.nack
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0

# TYPE rabbitmq_global_messages_dead_lettered_delivery_limit_total counter
# HELP rabbitmq_global_messages_dead_lettered_delivery_limit_total Total number of messages dead-lettered due to delivery-limit exceeded
rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0

# TYPE rabbitmq_global_messages_dead_lettered_confirmed_total counter
# HELP rabbitmq_global_messages_dead_lettered_confirmed_total Total number of messages dead-lettered and confirmed by target queues
rabbitmq_global_messages_dead_lettered_confirmed_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0

@ansd ansd force-pushed the mqtt-qos0-queue-metrics branch 3 times, most recently from c5082ac to efccc69 Compare August 15, 2023 13:57
Why:
A RabbitMQ operator should be able to see whether RabbitMQ drops MQTT
QoS 0 messages due to overload protection. It's an indication that an
MQTT subscriber does not consume fast enough.

How:
Use Prometheus global counters.

There are 2 valid solutions:
1. Introduce a new metric called messages_dropped specifically for the
   rabbitmq_mqtt_qos0_queue type. This would work in a similar fashion
   how streams extends the per protocol global counters, but requires
   extending the per protocol & queue type global counters for the MQTT
   QoS queue type. The emitted metrics would look as follows:
```
rabbitmq_global_messages_dropped_total{protocol="mqtt310",queue_type="rabbit_mqtt_qos0_queue"} 0
rabbitmq_global_messages_dropped_total{protocol="mqtt311",queue_type="rabbit_mqtt_qos0_queue"} 0
rabbitmq_global_messages_dropped_total{protocol="mqtt50",queue_type="rabbit_mqtt_qos0_queue"} 0
```
2. Reuse the existing metric rabbitmq_global_messages_dead_lettered_maxlen_total

This commit decides to go for the 2nd approach because:
a) there is no need to add a new metric. Even though dead lettering is not supported
for the MQTT QoS 0 queue type, this metric maps nicely to
what happens: The queue drop messages since itx max length
(mqtt.mailbox_soft_limit) is exceeded with overflow behaviour
drop-head. Furtheremore the label `dead_letter_strategy="disabled"` tells
that dead lettering is not taking place from this queue type.

b) this metric allows to support dead lettering for the MQTT QoS 0 queue
type in the future.

The new dead lettering metrics look as follows:
```
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_mqtt_qos0_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0

rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_expired_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0

rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_classic_queue",dead_letter_strategy="disabled"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_rejected_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0

rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_most_once"} 0
rabbitmq_global_messages_dead_lettered_delivery_limit_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="disabled"} 0

rabbitmq_global_messages_dead_lettered_confirmed_total{queue_type="rabbit_quorum_queue",dead_letter_strategy="at_least_once"} 0
```
@ansd ansd force-pushed the mqtt-qos0-queue-metrics branch from efccc69 to 0f5fe8f Compare August 15, 2023 14:06
ansd added a commit to rabbitmq/rabbitmq-website that referenced this pull request Aug 15, 2023
The test case was flaky:
```
*** CT Error Notification 2023-08-15 14:25:51.016 ***🔗
v5_SUITE:at_most_once_dead_letter_detect_cycle failed on line 871
Reason: {test_case_failed,Received unexpected message: {publish,#{client_pid => <0.227.0>,dup => false,
                                        packet_id => 1,
                                        payload =>
                                            <<"at_most_once_dead_letter_detect_cycle">>,
                                        properties =>
                                            #{'Subscription-Identifier' => 10},
                                        qos => 1,retain => false,
                                        topic => <<"a/b">>,
                                        via => #Port<0.76>}}}
```
@ansd ansd marked this pull request as ready for review August 15, 2023 15:35
@michaelklishin michaelklishin added this to the 3.12.3 milestone Aug 16, 2023
Copy link
Member

@michaelklishin michaelklishin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that piggybacking on the existing metric makes more sense.

@michaelklishin michaelklishin merged commit a6c5820 into main Aug 16, 2023
@michaelklishin michaelklishin deleted the mqtt-qos0-queue-metrics branch August 16, 2023 04:29
ansd added a commit that referenced this pull request Aug 16, 2023
follow up from #9080

Call rabbit_global_counters:init/1 just once.
ansd added a commit that referenced this pull request Aug 16, 2023
follow up from #9080

Call rabbit_global_counters:init/1 just once.
ansd added a commit to rabbitmq/rabbitmq-website that referenced this pull request Aug 16, 2023
mergify bot pushed a commit that referenced this pull request Aug 16, 2023
follow up from #9080

Call rabbit_global_counters:init/1 just once.

(cherry picked from commit 939540f)

# Conflicts:
#	deps/rabbitmq_mqtt/src/rabbit_mqtt.erl
rabbitmq-ci pushed a commit to rabbitmq/rabbitmq-website-next that referenced this pull request Aug 16, 2023
ansd added a commit that referenced this pull request Aug 16, 2023
follow up from #9080

Call rabbit_global_counters:init/1 just once.

(cherry picked from commit 939540f)

# Conflicts:
#	deps/rabbitmq_mqtt/src/rabbit_mqtt.erl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants