You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
A customer asked about how configuration changes are recorded / made visible to users. Configuration changes are logged at Info level: "configuration change was detected and the configuration was reloaded".
Customer: Could we get that somehow exposed via prometheus and/or otel metrics? It’s kinda odd to be flying blind here: We don’t run the refinery on info in prod (default is warn) and the endpoint is also not publicly exposed/easy to query for most folks. So I don’t know which version of the config the refinery has loaded and whether it picked up any changes.
me: The easy answer here would be a new metric gauge called something like config_checksum where we convert the config hash into a number and store it in the metric. It would change whenever the config file contents changed, and would be identical across all the items in your fleet if they all had the same config. Would that meet your needs?
Customer: yep - that’d be perfect 👍
Describe the solution you'd like
Config hashes are MD5 hashes (so that people can calculate them on the command line with the md5 tool to verify). Probably the right strategy is to take the last 4 digits of the MD5, convert them from hex to a number, and store that in a gauge metric. So if the md5 of the hash is 7f1237f7db723f4e874a7a8269081a77, we would convert 1a77 from hex to decimal, and the value of the metric would be 6775.
We also actually need two metrics -- config_checksum and rules_checksum.
Just for belt-and-suspenders, we should also increase the log message's priority to warn, and include both the full hash value and this decimal checksum in the log message. This will allow people to correlate the logs and metrics.
Describe alternatives you've considered
We could just improve the log, but this is easy and fits with the way a lot of people like to manage their clusters.
The text was updated successfully, but these errors were encountered:
<!--
Thank you for contributing to the project! 💜
Please make sure to:
- Chat with us first if this is a big change
- Open a new issue (or comment on an existing one)
- We want to make sure you don't spend time implementing something we
might have to say No to
- Add unit tests
- Mention any relevant issues in the PR description (e.g. "Fixes#123")
Please see our [OSS process
document](https://github.com/honeycombio/home/blob/main/honeycomb-oss-lifecycle-and-practices.md#)
to get an idea of how we operate.
-->
## Which problem is this PR solving?
To provide better visibility of current configuration used in Refinery,
this PR introduce two metrics, `config_hash` and `rule_config_hash`, for
keeping track of configuration.
## Short description of the changes
- change config change log from `info` level to `warn`
- include full config hash value in config change log
- store the decimal number of the last 4 digit of config hash value as
metrics
#967
Is your feature request related to a problem? Please describe.
A customer asked about how configuration changes are recorded / made visible to users. Configuration changes are logged at Info level: "configuration change was detected and the configuration was reloaded".
Describe the solution you'd like
7f1237f7db723f4e874a7a8269081a77
, we would convert1a77
from hex to decimal, and the value of the metric would be6775
.config_checksum
andrules_checksum
.Describe alternatives you've considered
We could just improve the log, but this is easy and fits with the way a lot of people like to manage their clusters.
The text was updated successfully, but these errors were encountered: