Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add secondary source of modules to Metricbeat to read light modules #12465

Merged
merged 40 commits into from
Jun 24, 2019

Conversation

jsoriano
Copy link
Member

@jsoriano jsoriano commented Jun 6, 2019

Implements #12270.

Modify metricbeat modules registry to include an optional secondary modules source.
A modules registry uses the secondary source as a fallback when it cannot provide
the configuration for an specific metricset. Semantics of current methods are
maintained.

An implementation for a secondary source is provided. This implementation loads
modules from disk that depend on traditional registered modules. Once loaded,
a "registration" object can be obtained, what includes a metricset factory built
with the specific overrides needed to instantiate the final metricset.

Beats based on metricbeat can opt-in for this feature by adding
WithLightModules() to its beat.Creator.
This option is added to the licensed metricbeat, for that a new root command
is instantiated.

Changes in packaging for these new modules is pending.

Some notes on this implementation:

  • I considered to implement Add support for Metricbeat modules based on existing modules #12270 alternatively as part of the builders in builder.go. This would have simplified some things, but then it would have been hardly implemented as an opt-in feature. Also the overrides needed in the registration factory would have been more distributed along the codebase. And the implementation would have been more based on global functions and objects. Having it in the builder, but not in the registry can be also tricky for other parts of the code, as they would be able to build metricsets that are not visible otherwise. An additional layer could be needed in any case to give a unified view for the registry and the builders.
  • Modules are always read from disk. We can easily implement other strategies as caching or preloading as the reading code is contained in one method. But I'd leave this discussion for the future if reading from disk at runtime is a problem at some moment, as this change may have other implications.
  • In the future we could also refactor the registry so the "default" one for registered modules is also a modules source, and to get modules and metricsets we iterate over all the available sources.

@jsoriano jsoriano requested a review from a team as a code owner June 6, 2019 15:40
@jsoriano jsoriano self-assigned this Jun 6, 2019
@jsoriano jsoriano mentioned this pull request Jun 6, 2019
3 tasks
@exekias
Copy link
Contributor

exekias commented Jun 7, 2019

It seems one test is failing, it could be related to this change

@exekias
Copy link
Contributor

exekias commented Jun 14, 2019

I guess it's safe to merge this before packaging because it doesn't change the current behavior until we actually ship modules?

Copy link
Contributor

@sayden sayden left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall is ok. There are things that I would do in a different way but let's start with something now and we'll see in the future when new needs arise, then we can refactor more things 😉

Copy link
Contributor

@exekias exekias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jsoriano at some point we may want to revisit docs on how to create a module? I guess we can wait until we have used it for a few modules and consider it stable enough

@jsoriano
Copy link
Member Author

@jsoriano at some point we may want to revisit docs on how to create a module? I guess we can wait until we have used it for a few modules and consider it stable enough

Yes, let's wait to have some module before that.

@jsoriano
Copy link
Member Author

Failing tests in travis are related, I can reproduce them.

@kaiyan-sheng
Copy link
Contributor

@jsoriano at some point we may want to revisit docs on how to create a module? I guess we can wait until we have used it for a few modules and consider it stable enough

Yes, let's wait to have some module before that.

I would love to try this on aws module 😬

@@ -27,7 +27,8 @@ import (
"github.com/elastic/beats/libbeat/cmd"
)

func buildModulesManager(beat *beat.Beat) (cmd.ModulesManager, error) {
// BuildModulesManager adds support for modules management to a beat
func BuildModulesManager(beat *beat.Beat) (cmd.ModulesManager, error) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: naming. cmd.ModulesManager is somewhat generic. Consider naming the constructor based on what is going to be constructed/initialized. For example NewLightweightModulesManager?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change made here is to make this method public so it can be used from x-pack, cmd.ModulesManager is the generic manager for modules subcommand.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I noticed this symbol becoming public here. But with it becoming public, the descriptive nature of the names used is even more important (a.k.a naming is hard).

metricbeat/cmd/test/modules.go Outdated Show resolved Hide resolved
MetricSets(module string) ([]string, error)
DefaultMetricSets(module string) ([]string, error)
HasMetricSet(module, name string) bool
MetricSetRegistration(r *Register, module, name string) (MetricSetRegistration, error)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Disclaimer: I don't expect this to be changed now or anytime soon. It's just that the interfaces already in place + the addition of the new functionality 'feel' funny.

I know this interface very much reflects how the Registry looks like, but it somewhat feels funny, as it forces to have me as user of the interface make the link between module and metricsets. Logically metricsets are part of one module only. Makes me wonder how difficult it would be to have interfaces like:

type ModulesSource interface {
    Modules() ([]string, error)
    Module(name string) (ModuleDefinition, error)
}

type ModuleDefinition struct { // could also be an interface...
    Name string
    Metricsets []MetricsetDefinition
}

type MetricsetDefinition struct {
    Name string
    Default bool
    Create ...
}

Also the fact that we have a circular dependency when initializing the registries, while maintaining order when creating actual modules cries for a more compositional pattern like this (no need for awkward setters):


type ModuleRegistry interface {
    ...
}

type CompositeModuleRegistry struct {
  primary, secondary ModuleRegistry
}

type BuiltinModuleRegistry struct {
}

type LightweightModuleRegistry struct {
}

with us constructing the registry like this:

primary := NewBuiltinModuleRegsitry(...)
secondary := NewLightweightModuleRegistry(primary, ...)
registry := &CompositeModuleRegistry{
  primary: primary,
  secondary: secondary,
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More thoughts about this, but +1 to don't change it now. We can do more backwards compatible refactors later and I'd like to have this functionality soon.

I also think that the ModulesSource interface feels funny, but it ended this way from a simpler one after trying to keep current registry semantics and after some feedback on previous versions. I think that to simplify it we'd also need to make deeper changes in the Registry and the Builder methods.

And yeah, registry configuration with this setter looks tricky, but I decided to do it this way taking into consideration that:

  • The registry is global and has to be available on import time so metricsets can be registered on init().
  • The secondary sources may need additional configuration as paths, that could be unavailable till metricbeat is running.

Ideally I think that we should decouple module/metricsets registration from the sources of modules/metricsets for the builder, and make the registry just another module source by implementing some common interface.

var Registry = NewRegister()
var ModuleBuilder = NewBuilder()
...
ModuleBuilder.AddSource(Registry)
...
ModuleBuilder.AddSource(NewLightModulesSource(path))
...
ModuleBuilder.AddSource(SomeOtherSource())

Then when building modules and metricsets we iterate over the list of available sources.

metricbeat/mb/lightmetricset.go Outdated Show resolved Hide resolved
func (m *LightMetricSet) baseModule(from Module) (*BaseModule, error) {
baseModule := BaseModule{
name: m.Module,
config: from.Config(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is a save operation. Are we guaranteed to always get a copy when calling Config? In ModuleConfig the fields Hosts, MetricSets, and Query are not values, but shared references. The values can be replaced by new instances, but must not be modified (e.g. operations like config.Query["abc"] = X are forbidden).

We call 'UnpackConfigwith targetconfig` a few lines down from here. Could this call overwrite entries in Hosts, MetricSets and Query?

What is the expected result after UnpackConfig if Hosts/MetricSets/Query was not empty?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It worked as expected on my tests, but yes, it doesn't look safe, I will try to get a better copy of this module config.

We call 'UnpackConfigwith targetconfig` a few lines down from here. Could this call overwrite entries in Hosts, MetricSets and Query?

What is the expected result after UnpackConfig if Hosts/MetricSets/Query was not empty?

Hosts and MetricSets shouldn't be changed here, query could be, in this case the map should be merged. It may require more tests to cover these cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not safe, and this was actually not needed as we copy from the raw config later. I have extended the tests to check that original base module is not modified, and that we apply the overrides in the case of query.

if !exists {
return MetricSetRegistration{}, fmt.Errorf("metricset '%s/%s' is not registered, metricset not found", module, name)
// Fallback to secondary source if module is not registered
if source := r.secondarySource; source != nil && source.HasMetricSet(module, name) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer to fail early. The 'good' path return should be the last return:

secondary := r.secondarySource
if secondary == nil || !secondary.HasMetricSet(module, name) {
  return MetricSetRegistration{}, fmt.Errorf(...)
}

reg, err := secondary.MetricSetRegistration(r, module, name)
if err != nil {
  return ..., errors.Wrapf(...)
}

return reg, nil

Question: Just trying to reduce the size of an interface, but do we need to have 'HasMetricSet' on the secondary one? Shouldn't 'MetricSetRegistration' error if the metricset does not exist? :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer to fail early. The 'good' path return should be the last return:

I usually agree with this, but in this case there were lots of repeated lines with the same module-not-found error, so I decided to leave it as the final case. I don't have a strong preference, but for this case I think it is better like it is now.

Question: Just trying to reduce the size of an interface, but do we need to have 'HasMetricSet' on the secondary one?

The source.MetricSetRegistration can return an error also if it fails to load a module, originally I made this method to return (MetricSetRegistration, bool, error), and I used the bool to know if the metricset exist, but after some feedback I separated this in two methods, one to check if the module exist, and the other one to create the registration from it.

Shouldn't 'MetricSetRegistration' error if the metricset does not exist? :)

It does, in the registry :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:)

I guess this is what you get for using fmt.Errorf and Wrapf for reporting errors, instead of having proper error types that can also signal the actual cause. In general I'm reluctant to add methods to an interface, but +1 for not writing (MetricSetRegistration, bool, error).

x-pack/metricbeat/cmd/root.go Outdated Show resolved Hide resolved
x-pack/metricbeat/mb/lightmodules_test.go Show resolved Hide resolved
@urso
Copy link

urso commented Jun 20, 2019

Not a big issue assuming that we test all our LWMs via CI, but it would be nice if there is a check on startup, testing if the modules config files can actually be parsed.

@jsoriano
Copy link
Member Author

@urso I have addressed most of your comments, thanks for them! you were right about your fears on accidentaly overwriting base module configs. I have left further refactors and the startup checker for future discussions.
Could you please take another look?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants