Skip to content

Commit

Permalink
Prometheus exporter: make sure all parse requirements are met (#193)
Browse files Browse the repository at this point in the history
* script to validate metrics at runtime

* typo

* check trailing newline needs to be done before splitlines

* make sure stream trails with newline

* label value can be empty

* fix mistake in label regex

* include empty keys, to make sure label set is consistent

* fix export options, to avoid duplicate labels

* properly parse boolean parameters

* avoid metric name conflict

* fix return value when nothing is scraped

* drop using lib alias

* typo in plugin params

* check for duplicate metatags, since telegraf complains about this as well

* ugly temporary solution against duplicate metatags

* temporary fix to duplicate node labels, until fixed in Aggregator plugin

* resolve conflicting names with system_node.yaml, to prevent label inconsistency

* harvest yml changes

deb/rpm harvest.example changes

Handle special characters in passwords

This change only addresses passwords in Pollers and Defaults. The bigger
refactor is to use HarvestConfig through out the codebase, but that was too
big a change at the moment. That change touches a lot more code.

When that change is made, the code in conf.LoadConfig can be removed.

fix remaining merge

Enable GitHub code scanning

Remove extra fmt workflow action

Remove redundant Slack section and polish

Add Dev team to clabot

Add license check and GitHub action

add zerolog pretty print for console

InsecureSkipVerify with basicauth

Correct httpd logging pattern

Replace snake case with camel

Fix mistyped package

Shelf purges instances too soon

Fixes #75

update clabot

allow user-defined URL for the influxDB server

update conf tests, move allow_addrs_regex: not influxdb parameter

auth test cases

Change triage label

Replace CCLA.pdf with online link to CCLA

Remove CONTRIBUTING_CCLA.pdf

uniform structure of collector doc, add explanation about metric collection/calculation

add known issue on WSL

update toc

add rename example, remove tabs disliked by markdown

removed allow_addrs_regex, not a parameter

tab to space

tab to space

remove redundant TOC; spelling

typos in docs

support/hacks for workload objects

templates for 4 workload objects

re-add earlier removed disk counters

chrishenzie has signed the CCLA

Make vendored copy of dependencies

handle panic in collector

Allow insecure Grafana TLS connections

`harvest/grafana` should not rewrite https connections into http

Fixes #111

enable caller for zerolog

Remove buildmode=plugin

Add support for cluster simulator
WIP Implement Caddy style plugins for collectors
Fix go vet warnings in node.go

enable stacktrace during errors

InfluxDB exporter should pass url unchanged

Thanks to @steverweber for the suggestion
Fixes #63

Add unique prom ports and export type

checks to doctor

Prometheus dashboards don't load when exemplar = true

Fixes #96

Don't run harvest as root on RHEL/Deb

See also #122

Improve harvest start behavior

Two cases are improved here:
1) Harvest detects when there is a stale pidfile and correctly restarts the poller process. A stale pidfile is when the pidfile exists in `/var/run/harvest` but there is no running process associated with that pid.

1) Harvest no longer suggests killing an already running poller when you try to start it. This is a a no-op.

Fixes #123

stop renamed pollers

resolved comments for stop pollers in case of rename

Addressed review comments Fixes #20

Restore Zapiperf support workload changes

add missing tag for labels pseudometric

cache ZAPI counters to distinct from own metircs

Update needs triage label

rpb deb bugs Fixes #50 Fixes #129

Auth_style should not be redacted

Run workflows on release branch

Remove unused graphite_leaves

PrometheusPort should be int

Trim absolute file system paths

Add -trimpath to go build so errors and stacktraces print
with module path@version instead of this

{"level":"info","Poller":"infinity","collector":"ZapiPerf:WAFLAggr","caller":"/var/jenkins_home/workspace/BuildHarvestArtifacts/harvest/cmd/poller/collector/collector.go:318","time":"2021-06-11T13:40:03-04:00","message":"recovered from standby mode, back to normal schedule"}

correct ghost poll kill

Sridevi has signed CCLA

Update README.md

Added Upgrade steps to README file
Removed specific links in the Installation steps
Overall updated format

Polish README.md

Reduce redundant information
Make tar gz example copy pasteable

Fix panic in unix.go

When a poller in harvest.yml is changed while a unix collector is running it panics

Fixes #160

Remove pidfiles

- Improve poller detection by injecting IS_HARVEST into exec-ed process's
environment.
- Simplify management code and improve accuracy
- Remove /var/run logic from RPM and Deb

script to validate metrics at runtime

typo

update changelog

update support md

update readme

run ghost kill poller during harvest start

Store reason as a label for disk.yaml so

that disk status is correctly reported

Fixes #182

check trailing newline needs to be done before splitlines

make sure stream trails with newline

label value can be empty

fix mistake in label regex

include empty keys, to make sure label set is consistent

fix export options, to avoid duplicate labels

properly parse boolean parameters

avoid metric name conflict

fix return value when nothing is scraped

drop using lib alias

typo in plugin params

Correcting Grafana Cluster Dashboard Typo plus other same typos

port range changes

resolved merge commits

port range review comments

Encapsulate port mapping

port range changes

Reduce the amount of time and attempts spinning

for status checks

Makes a big difference on Mac when process is not found
Goes from 19.5 seconds to (not) start 27 pollers to
1.9 seconds

Add README on how to setup per poller systemd

services.

Add generate systemd subcommand

check for duplicate metatags, since telegraf complains about this as well

ugly temporary solution against duplicate metatags

temporary fix to duplicate node labels, until fixed in Aggregator plugin

resolve conflicting names with system_node.yaml, to prevent label inconsistency

shelf dashboard: adding ovverride option for shelf field

Node Dashboard Bugs

Co-authored-by: rahulg2 <rahul.gupta@netapp.com>
  • Loading branch information
Vachagan Gratian and rahulguptajss authored Jun 22, 2021
1 parent 223dc22 commit aacb3f6
Show file tree
Hide file tree
Showing 13 changed files with 364 additions and 42 deletions.
47 changes: 45 additions & 2 deletions cmd/exporters/prometheus/httpd.go
Original file line number Diff line number Diff line change
Expand Up @@ -116,15 +116,26 @@ func (me *Prometheus) ServeMetrics(w http.ResponseWriter, r *http.Request) {
if md, err := e.Render(e.Metadata); err == nil {
data = append(data, md...)
}*/
}
*/

if me.addMetaTags {
data = filterMetaTags(data)
}

w.WriteHeader(200)
w.Header().Set("content-type", "text/plain")
_, err := w.Write(bytes.Join(data, []byte("\n")))
if err != nil {
me.Logger.Error().Stack().Err(err).Msg("error")
me.Logger.Error().Stack().Err(err).Msg("write metrics")
}

// make sure stream ends with newline
if _, err = w.Write([]byte("\n")); err != nil {
me.Logger.Error().Stack().Err(err).Msg("write ending newline")
}

// update metadata
me.Metadata.Reset()
err = me.Metadata.LazySetValueInt64("time", "http", time.Since(start).Microseconds())
if err != nil {
Expand All @@ -136,6 +147,38 @@ func (me *Prometheus) ServeMetrics(w http.ResponseWriter, r *http.Request) {
}
}

// filterMetaTags removes duplicate TYPE/HELP tags in the metrics
// Note: this is a workaround, normally Render() will only add
// one TYPE/HELP for each metric type, however since some metric
// types (e.g. metadata_collector_count) are submitted from multiple
// collectors, we end up with duplicates in the final batch delivered
// over HTTP.
func filterMetaTags(metrics [][]byte) [][]byte {

filtered := make([][]byte, 0)

metricsWithTags := make(map[string]bool)

for i, m := range metrics {
if bytes.HasPrefix(m, []byte("# ")) {
if fields := strings.Fields(string(m)); len(fields) > 3 {
name := fields[2]
if !metricsWithTags[name] {
metricsWithTags[name] = true
filtered = append(filtered, m)
if i+1 < len(metrics) {
filtered = append(filtered, metrics[i+1])
i++
}
}
}
} else {
filtered = append(filtered, m)
}
}
return filtered
}

// ServeInfo provides a human-friendly overview of metric types and source collectors
// this is done in a very inefficient way, by "reverse engineering" the metrics.
// That's probably ok, since we don't expect this to be called often.
Expand Down
37 changes: 27 additions & 10 deletions cmd/exporters/prometheus/prometheus.go
Original file line number Diff line number Diff line change
Expand Up @@ -274,7 +274,7 @@ func (me *Prometheus) render(data *matrix.Matrix) ([][]byte, error) {
tagged *set.Set
labels_to_include, keys_to_include, global_labels []string
prefix string
include_all_labels bool
err error
)

rendered = make([][]byte, 0)
Expand All @@ -296,10 +296,19 @@ func (me *Prometheus) render(data *matrix.Matrix) ([][]byte, error) {
me.Logger.Debug().Msgf("requested keys_labels : %v", keys_to_include)
}

if options.GetChildContentS("include_all_labels") == "true" {
include_all_labels = true
} else {
include_all_labels = false
include_all_labels := false
require_instance_keys := true

if x := options.GetChildContentS("include_all_labels"); x != "" {
if include_all_labels, err = strconv.ParseBool(x); err != nil {
me.Logger.Error().Stack().Err(err).Msg("parameter: include_all_labels")
}
}

if x := options.GetChildContentS("require_instance_keys"); x != "" {
if require_instance_keys, err = strconv.ParseBool(x); err != nil {
me.Logger.Error().Stack().Err(err).Msg("parameter: require_instance_keys")
}
}

prefix = me.globalPrefix + data.Object
Expand All @@ -318,18 +327,26 @@ func (me *Prometheus) render(data *matrix.Matrix) ([][]byte, error) {
me.Logger.Trace().Msgf("rendering instance [%s] (%v)", key, instance.GetLabels())

instance_keys := make([]string, len(global_labels))
instance_labels := make([]string, 0)
copy(instance_keys, global_labels)
instance_keys_ok := false
instance_labels := make([]string, 0)

if include_all_labels {
for label, value := range instance.GetLabels().Map() {
instance_keys = append(instance_keys, fmt.Sprintf("%s=\"%s\"", label, value))
// temporary fix for the rarely happening duplicate labels
// known case is: ZapiPerf -> 7mode -> disk.yaml
// actual cause is the Aggregator plugin, which is adding node as
// instance label (even though it's already a global label for 7modes)
if !data.GetGlobalLabels().Has(label) {
instance_keys = append(instance_keys, fmt.Sprintf("%s=\"%s\"", label, value))
}
}
} else {
for _, key := range keys_to_include {
value := instance.GetLabel(key)
if value != "" {
instance_keys = append(instance_keys, fmt.Sprintf("%s=\"%s\"", key, value))
instance_keys = append(instance_keys, fmt.Sprintf("%s=\"%s\"", key, value))
if !instance_keys_ok && value != "" {
instance_keys_ok = true
}
me.Logger.Trace().Msgf("++ key [%s] (%s) found=%v", key, value, value != "")
}
Expand All @@ -341,7 +358,7 @@ func (me *Prometheus) render(data *matrix.Matrix) ([][]byte, error) {
}

// @TODO, probably be strict, and require all keys to be present
if len(instance_keys) == 0 && options.GetChildContentS("require_instance_keys") != "False" {
if !instance_keys_ok && require_instance_keys {
me.Logger.Trace().Msgf("skip instance, no keys parsed (%v) (%v)", instance_keys, instance_labels)
continue
}
Expand Down
Loading

0 comments on commit aacb3f6

Please sign in to comment.