Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric collector and exporter #487

Merged
merged 10 commits into from
Dec 30, 2019
Merged

Metric collector and exporter #487

merged 10 commits into from
Dec 30, 2019

Conversation

jcmoraisjr
Copy link
Owner

@jcmoraisjr jcmoraisjr commented Dec 22, 2019

Add a starting version of a metric collector and exporter

  • metric raw type and interface
  • controller processing time (parsing, writing cfg, reloading, ...)
  • haproxy processing time (based on Idle_pct)
  • haproxy admin socket response time (show info, set server, ...)
  • cumulative updates (noop, dynamic, with reload)
  • successful haproxy reload
  • certificate expiration correlated with hostname
  • doc

Initial features:

* Metrics interface - for mocks
* processing time based on Idle_pct - run show info via admin socket every 500ms, can be disabled via a command-line option
* haproxy response time histogram
Two new metrics to the update/resync/reload process:

* A counter which tracks update to the config or to the cluster with three distinct time series - noop (configurations match), dynamic (update without reload), full (update with reload)
* A gauge which goes to zero (0) if a haproxy config validation or haproxy reload failed
A new lightweight collector and metric which correlates hostnames and NotAfter field of x509 certificates in Unix Epoch time.
Add the set server command to the haproxy_response_time histogram. Set server is used by dynamic updates.

The approach used is to receive a callback in HAProxyCommand(), so the func itself will register the elapsed time and register the metric. The caller, in this case instance and dynupdate, need to know an instance of Metric intf, whose `HAProxy<xxx>ResponseTime()` func should be used as a callback implementation.
Two new counters (sum and count) for cumulative time of controller tasks. The timer, already used to slice the time of an update event, receives a callback used to notify the elapsed time.
Rotate config, currently, takes the config object on curConfig and copy it to oldConfig, and also assign nil to the curConfig reference. The oldConfig reference has two meanings:

1) it is used to compare with a new curConfig in order to perform dynamic updates
2) if oldConfig != nil it means the controller has already started haproxy

The 2) is used by `CalcIdleMetric()`, which cannot connect to the admin socket on its early executions on big configurations. Such configurations take some seconds to start haproxy for the first time.
Timer labels are currently used only in the config update: ingress parsing, config writing, template validation, haproxy reload. The labels are used to show partial duration of a wider event.

Some of these labels was incomplete (eg `reload` what?) and not properly aligned with common practices of metric exporters.
Add the ability to configure the time.Duration between two consecutive Idle_pct readings. Zero disables the reading and the metric.
updateSuccessful() call is pointless if the configuration wasn't persed. This would change the flag to true/ok on successful partial/dynamic update. Changing now to only update this status if the configuration was parsed, either just validating or reloading haproxy.
@jcmoraisjr jcmoraisjr merged commit 0e21993 into master Dec 30, 2019
@jcmoraisjr jcmoraisjr deleted the jm-metrics branch December 30, 2019 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant