-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIP-271: Add broker health check status into prometheus metrics #20147 #20389
Comments
Few comments before you continue with implementation:
|
healthcheck attribute is another attribute which gets refreshed by regular task same as other tasks like backlog-check, stats-check, inactive-topic, etc. also, this PIP doesn't introduce any new service or implementation rather it only calls existing admin-api on regular interval to refresh the attribute. so, it follows the existing task pattern and don't think it requires a new service class here. |
VOTE has passed and this PIP has been approved |
Motivation:
Broker metrics don't have anything to indicate health of the broker (to indicate if broker is active). In Prometheus broker metrics which are used for monitoring, it will be useful if metrics also show the broker health. This way, Prometheus can automatically scrape the broker state and can be used for monitoring purposes. So we need such metric to capture broker health.
Goals:
This PIP adds support to include the broker health status in the broker operability metrics.
Sample:
When we hit "/metrics" endpoint, a part of the output looks like below. Notice the "pulsar_health" metric which is added as a result of this PIP. The status "1" says that broker is active, "0" is inactive, "-1" unknown.
Approach:
A new metric called "brk_health" is added into the BrokerOperabilityMetrics. This metric is updated at a fixed rate from the BrokerService.
We schedule a periodic health check job at a fixed rate in the BrokerService. This job updates the broker health check metric in the BrokerOperabilityMetrics stats at the frequency configured in the broker configs.
No new API is needed as we already have a "healthCheck" API in Admin module which provides the necessary functionality. However we don't make a REST call to this API as it could be costly. Instead we add a helper function "internalRunHealthCheck" in the Admin module which piggy backs on the existing functionality in the Admin module.
Configuration Changes:
This PIP gives option to dynamically switch on/off the broker health check metric using "healthCheckMetricsUpdateTimeInSeconds" config. Setting it to -1 will disable the metric. We can also configure the frequency of the metric update using this config. By default it is set to value "-1" which effectively disables it.
The text was updated successfully, but these errors were encountered: