Store config hash in a metric so that it's visible at all times #967

kentquirk · 2024-01-09T15:30:31Z

Is your feature request related to a problem? Please describe.

A customer asked about how configuration changes are recorded / made visible to users. Configuration changes are logged at Info level: "configuration change was detected and the configuration was reloaded".

Customer: Could we get that somehow exposed via prometheus and/or otel metrics? It’s kinda odd to be flying blind here: We don’t run the refinery on info in prod (default is warn) and the endpoint is also not publicly exposed/easy to query for most folks. So I don’t know which version of the config the refinery has loaded and whether it picked up any changes.

me: The easy answer here would be a new metric gauge called something like config_checksum where we convert the config hash into a number and store it in the metric. It would change whenever the config file contents changed, and would be identical across all the items in your fleet if they all had the same config. Would that meet your needs?

Customer: yep - that’d be perfect 👍

Describe the solution you'd like

Config hashes are MD5 hashes (so that people can calculate them on the command line with the md5 tool to verify). Probably the right strategy is to take the last 4 digits of the MD5, convert them from hex to a number, and store that in a gauge metric. So if the md5 of the hash is 7f1237f7db723f4e874a7a8269081a77, we would convert 1a77 from hex to decimal, and the value of the metric would be 6775.
We also actually need two metrics -- config_checksum and rules_checksum.
Just for belt-and-suspenders, we should also increase the log message's priority to warn, and include both the full hash value and this decimal checksum in the log message. This will allow people to correlate the logs and metrics.

Describe alternatives you've considered

We could just improve the log, but this is easy and fits with the way a lot of people like to manage their clusters.

The text was updated successfully, but these errors were encountered:

## Which problem is this PR solving? To provide better visibility of current configuration used in Refinery, this PR introduce two metrics, `config_hash` and `rule_config_hash`, for keeping track of configuration. ## Short description of the changes - change config change log from `info` level to `warn` - include full config hash value in config change log - store the decimal number of the last 4 digit of config hash value as metrics #967

kentquirk added the type: enhancement New feature or request label Jan 9, 2024

kentquirk added this to the v2.4 milestone Jan 9, 2024

fchikwekwe self-assigned this Jan 31, 2024

fchikwekwe removed their assignment Feb 20, 2024

fchikwekwe modified the milestones: v2.4, v2.5 Feb 21, 2024

MikeGoldsmith modified the milestones: vNEXT, 2.6 Mar 13, 2024

VinozzZ modified the milestones: v2.6, v2.7 Jun 14, 2024

VinozzZ self-assigned this Jun 18, 2024

VinozzZ mentioned this issue Jun 18, 2024

feat: track config hash on config reload #1212

Merged

VinozzZ closed this as completed Jun 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store config hash in a metric so that it's visible at all times #967

Store config hash in a metric so that it's visible at all times #967

kentquirk commented Jan 9, 2024 •

edited

Loading

Store config hash in a metric so that it's visible at all times #967

Store config hash in a metric so that it's visible at all times #967

Comments

kentquirk commented Jan 9, 2024 • edited Loading

kentquirk commented Jan 9, 2024 •

edited

Loading