Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor state_* metricsets to share response from endpoint #25640

Merged
merged 17 commits into from
May 18, 2021
16 changes: 11 additions & 5 deletions metricbeat/helper/prometheus/prometheus.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@ type Prometheus interface {

GetProcessedMetrics(mapping *MetricsMapping) ([]common.MapStr, error)

ProcessMetrics(families []*dto.MetricFamily, mapping *MetricsMapping) ([]common.MapStr, error)

ReportProcessedMetrics(mapping *MetricsMapping, r mb.ReporterV2) error
}

Expand Down Expand Up @@ -139,11 +141,7 @@ type MetricsMapping struct {
ExtraFields map[string]string
}

func (p *prometheus) GetProcessedMetrics(mapping *MetricsMapping) ([]common.MapStr, error) {
families, err := p.GetFamilies()
if err != nil {
return nil, err
}
func (p *prometheus) ProcessMetrics(families []*dto.MetricFamily, mapping *MetricsMapping) ([]common.MapStr, error) {

eventsMap := map[string]common.MapStr{}
infoMetrics := []*infoMetricData{}
Expand Down Expand Up @@ -260,6 +258,14 @@ func (p *prometheus) GetProcessedMetrics(mapping *MetricsMapping) ([]common.MapS
return events, nil
}

func (p *prometheus) GetProcessedMetrics(mapping *MetricsMapping) ([]common.MapStr, error) {
families, err := p.GetFamilies()
if err != nil {
return nil, err
}
return p.ProcessMetrics(families, mapping)
}

// infoMetricData keeps data about an infoMetric
type infoMetricData struct {
Labels common.MapStr
Expand Down
98 changes: 98 additions & 0 deletions metricbeat/module/kubernetes/kubernetes.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
// Licensed to Elasticsearch B.V. under one or more contributor
// license agreements. See the NOTICE file distributed with
// this work for additional information regarding copyright
// ownership. Elasticsearch B.V. licenses this file to you under
// the Apache License, Version 2.0 (the "License"); you may
// not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.

package kubernetes

import (
"github.com/elastic/beats/v7/libbeat/logp"
"sync"
"time"
"fmt"

dto "github.com/prometheus/client_model/go"

p "github.com/elastic/beats/v7/metricbeat/helper/prometheus"
"github.com/elastic/beats/v7/metricbeat/mb"
)

func init() {
// Register the ModuleFactory function for the "kubernetes" module.
if err := mb.Registry.AddModule("kubernetes", ModuleBuilder()); err != nil {
panic(err)
}
}

type Module interface {
mb.Module
GetSharedFamilies(prometheus p.Prometheus, ms string) ([]*dto.MetricFamily, error)
}

type familiesCache struct {
sharedFamilies []*dto.MetricFamily
lastFetchErr error
lastFetchTimestamp time.Time
setter string
}

type cacheMap map[string]*familiesCache

type module struct {
mb.BaseModule
lock sync.Mutex

fCache cacheMap
logger *logp.Logger
}

func ModuleBuilder() func(base mb.BaseModule) (mb.Module, error) {
jsoriano marked this conversation as resolved.
Show resolved Hide resolved
sharedFamiliesCache := make(cacheMap)
return func(base mb.BaseModule) (mb.Module, error) {
hash := fmt.Sprintf("%s%s", base.Config().Period, base.Config().Hosts)
sharedFamiliesCache[hash] = &familiesCache{}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These entries will be never removed, this can be a leak if metricbeat is used to monitor clusters dynamically created. I guess this is only a corner case, we can leave this by now.

Copy link
Member Author

@ChrsMark ChrsMark May 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave a comment about this in the code so as to have a good pointer if an issue arise in the future. One thing we could do (on top of my head suggestion follows) to tackle this could be to have a method on module level to figure out what entries to remove, which method will be called from Metricset's Close().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This map is being written every time a module is created. As it is now, I see two possible problems:

  • There can be race conditions (and panics) if several metricsets are created at the same time (not sure if possible), or if a metricset calls GetSharedFamilies while other metricset with the same hosts is being created (I guess this can happen with bad luck and/or with a low metricbeat.max_start_delay).
  • If a metricset is created after another one has already filled the cache, the cache will be reset, not a big problem, but could be easily solved by checking if the cache entry exists.

I think reads and writes on this map should be also thread safe. And ideally we should check if there is some entry in the cache for a given key before overwriting it here.

m := module{
BaseModule: base,
logger : logp.NewLogger(fmt.Sprintf("debug (%s)", hash)),
fCache: sharedFamiliesCache,
}
m.logger.Warn("Building module now with ", base.Config())
return &m, nil
}
}

func (m *module) GetSharedFamilies(prometheus p.Prometheus, ms string) ([]*dto.MetricFamily, error) {
m.lock.Lock()
defer m.lock.Unlock()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This lock is in the module, but the cache is shared between all modules. The cache should have its own lock, and be the same for all modules/metricsets using it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch thanks!


now := time.Now()
hash := fmt.Sprintf("%s%s", m.BaseModule.Config().Period, m.BaseModule.Config().Hosts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a common function to generate the hash, this hash is being calculated also in the function returned by ModuleBuilder().

And/or consider doing the initialization of m.fCache[hash] here so it is not needed to calculate the hash when initializing the module.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the period needs to be part of the hash key, it is ok if metricsets with the same hosts but different period share the families.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I will remove it.

fCache := m.fCache[hash]

if ms != fCache.setter {
m.logger.Warn("DIFF[ms!=cacheSetter]: ", ms, " != ", fCache.setter)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this only to report that the metricset getting the families is different to the metricset that stored it? Not sure if needed, in any case log it at the debug level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be removed.


if fCache.lastFetchTimestamp.IsZero() || now.Sub(fCache.lastFetchTimestamp) > m.Config().Period {
m.logger.Warn("FETCH families for ms: ", ms, ". Last setter was ", fCache.setter)
fCache.sharedFamilies, fCache.lastFetchErr = prometheus.GetFamilies()
fCache.lastFetchTimestamp = now
fCache.setter = ms
} else {
m.logger.Warn("REUSE families for ms: ", ms, ". Last setter was ", fCache.setter)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap I will remove them completely.

}

return fCache.sharedFamilies, fCache.lastFetchErr
}
16 changes: 14 additions & 2 deletions metricbeat/module/kubernetes/state_container/state_container.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
package state_container

import (
"fmt"
"strings"

"github.com/pkg/errors"
Expand All @@ -26,6 +27,7 @@ import (
p "github.com/elastic/beats/v7/metricbeat/helper/prometheus"
"github.com/elastic/beats/v7/metricbeat/mb"
"github.com/elastic/beats/v7/metricbeat/mb/parse"
k8smod "github.com/elastic/beats/v7/metricbeat/module/kubernetes"
"github.com/elastic/beats/v7/metricbeat/module/kubernetes/util"
)

Expand Down Expand Up @@ -89,20 +91,26 @@ type MetricSet struct {
mb.BaseMetricSet
prometheus p.Prometheus
enricher util.Enricher
mod k8smod.Module
}

// New create a new instance of the MetricSet
// Part of new is also setting up the configuration by processing additional
// Part of newF is also setting up the configuration by processing additional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo?

Suggested change
// Part of newF is also setting up the configuration by processing additional
// Part of new is also setting up the configuration by processing additional

// configuration entries if needed.
func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
prometheus, err := p.NewPrometheusClient(base)
if err != nil {
return nil, err
}
mod, ok := base.Module().(k8smod.Module)
if !ok {
return nil, fmt.Errorf("must be child of kubernetes module")
}
return &MetricSet{
BaseMetricSet: base,
prometheus: prometheus,
enricher: util.NewContainerMetadataEnricher(base, false),
mod: mod,
}, nil
}

Expand All @@ -112,7 +120,11 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
func (m *MetricSet) Fetch(reporter mb.ReporterV2) error {
m.enricher.Start()

events, err := m.prometheus.GetProcessedMetrics(mapping)
families, err := m.mod.GetSharedFamilies(m.prometheus, "state_container")
if err != nil {
return errors.Wrap(err, "error getting families")
}
events, err := m.prometheus.ProcessMetrics(families, mapping)
if err != nil {
return errors.Wrap(err, "error getting event")
}
Expand Down
17 changes: 15 additions & 2 deletions metricbeat/module/kubernetes/state_pod/state_pod.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,13 @@
package state_pod

import (
"fmt"
"github.com/elastic/beats/v7/libbeat/common"
"github.com/elastic/beats/v7/libbeat/common/kubernetes"
p "github.com/elastic/beats/v7/metricbeat/helper/prometheus"
"github.com/elastic/beats/v7/metricbeat/mb"
"github.com/elastic/beats/v7/metricbeat/mb/parse"
k8smod "github.com/elastic/beats/v7/metricbeat/module/kubernetes"
"github.com/elastic/beats/v7/metricbeat/module/kubernetes/util"
)

Expand Down Expand Up @@ -72,6 +74,7 @@ type MetricSet struct {
mb.BaseMetricSet
prometheus p.Prometheus
enricher util.Enricher
mod k8smod.Module
}

// New create a new instance of the MetricSet
Expand All @@ -82,11 +85,15 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
if err != nil {
return nil, err
}

mod, ok := base.Module().(k8smod.Module)
if !ok {
return nil, fmt.Errorf("must be child of kubernetes module")
}
return &MetricSet{
BaseMetricSet: base,
prometheus: prometheus,
enricher: util.NewResourceMetadataEnricher(base, &kubernetes.Pod{}, false),
mod: mod,
}, nil
}

Expand All @@ -96,7 +103,13 @@ func New(base mb.BaseMetricSet) (mb.MetricSet, error) {
func (m *MetricSet) Fetch(reporter mb.ReporterV2) {
m.enricher.Start()

events, err := m.prometheus.GetProcessedMetrics(mapping)
families, err := m.mod.GetSharedFamilies(m.prometheus, "state_pod")
if err != nil {
m.Logger().Error(err)
reporter.Error(err)
return
}
events, err := m.prometheus.ProcessMetrics(families, mapping)
if err != nil {
m.Logger().Error(err)
reporter.Error(err)
Expand Down