Improve event normalization performance #22781

newly12 · 2020-11-28T10:31:12Z

What does this PR do?

Improve event normalization performance

Why is it important?

generalize/normalize is widely used in beats

Checklist

My code follows the style guidelines of this project
~~I have commented my code, particularly in hard-to-understand areas~~
~~I have made corresponding changes to the documentation~~
~~I have made corresponding change to the default configuration files~~
I have added tests that prove my fix is effective or that my feature works
I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

[ ]

How to test this PR locally

Benchmark

Related issues

Use cases

Screenshots

Logs

Before

BenchmarkNormalization-8   	 1329153	       920 ns/op	    1176 B/op	      13 allocs/op

After

BenchmarkNormalization-8   	 2270809	       541 ns/op	     736 B/op	       5 allocs/op

newly12 · 2020-11-30T02:40:11Z

on another thought, should I be able to configured whether skip normalization could also be defined at a pipeline level instead of current beat/global level? i.e. adding another fields SkipNormalization in beat.ProcessingConfig and have it to to be honored while creating the pipeline at here. The reason for this question is that if I guarantee the event generated by my input are already using all standard types, this should decrease the load a lot for an agent that is running hot.

Please let me know your thought on this, if you think it's acceptable, I can file a separate PR for this for add it in this one.. Thanks!

another edit, I tried to run filebeat and passing my own settings to it to skip normalization, it worked fine for a customized input, but failed for a normal log input, the program panics in addFields processor which tries to deepUpdate event's fields while for there is no Field initialized in event like this seems to be an internal event. Maybe it also makes sense to allow addFields to tolerant nil-fields event?

elasticmachine · 2020-11-30T06:15:09Z

Pinging @elastic/integrations-services (Team:Services)

urso · 2020-12-02T12:46:24Z

libbeat/common/event.go

@@ -59,8 +61,7 @@ func NewGenericEventConverter(keepNull bool) *GenericEventConverter {
 // Nil values in maps are dropped during the conversion. Any unsupported types
 // that are found in the MapStr are dropped and warnings are logged.
 func (e *GenericEventConverter) Convert(m MapStr) MapStr {
-	keys := make([]string, 0, 10)
-	event, errs := e.normalizeMap(m, keys...)
+	event, errs := e.normalizeMap(m, e.keys[:0]...)


If the keys have been constructed e.g. fmt.Sprintf, then we should allow them to be gc'ed. Meaning that we also need a e.keys = e.keys[:0] on function exit.

The keys array is used for error reporting only, which is supposed to happen not all that often. Instead of constructing and passing down the keys array we might want to handle this in the error path only. e.g.

type keyError struct { path []string message string cause error } func (e *keyError) Unwrap() error { return e.cause } func (e *keyError) Error() string { if len(e.path) == 0 { return fmt.Sprintf("%v: %v", e.message, e.cause) } return fmt.Sprintf("key=%v: %v: %v", joinKeys(e.path), e.message, e.cause) } func newError(msg string) *keyError { return &keyError{ message: msg } } func newErrorf(msg string, args ...interface{}) *keyError { return newError(fmt.Sprintf(msg, args...) } func wrapErr(cause error, msg string) *keyError { return &keyError{ message: msg, cause: cause}} func wrapErrf(cause error, msg string, args ...interface{}) *keyError { return wrapErr(cause, fmt.Sprintf(msg, args...)) }

Efficiently constructing the path can be tricky, as we have to construct 'path' when passing the error up. Either we do not care (it's an edge-case anyways), or we do so in the reverse order and just "reverse" the array in the top-level Convert method. E.g.

func (e *GenericEventConverter) Convert(m MapStr) MapStr { events, errs := e.normalizeMap(m) if len(errs) > 0 { for _, err := range errs { reverseKeys(err.path) } e.log.Warnf(...) } return event } func (e *keyError) addKey(k string) { e.path = append(e.path, k) } func (e *GenericEventConverter) normalizeMap(m MapStr) (MapStr, []*KeyError) { ... for key, value := range m { v, valueErrs := e.normalizeValue(value) if len(valueErrs) > 0 { for _, err := range valueErrs { err.addKey(key) } errs = append(errs, valueErrs...) } } ... } func (e *GenericEventConverter) normalizeValue(value interface{}) (interface{}, *KeyError) { ... }

It is a little more code, but a proper error type seems to help reducing allocations. In order to reduce the number of allocations in the error path one could pass an int to keep track of the level. e.g.

func newError(level int, message string) *keyError { return &keyError{ path: make([]string, 0, level), message: message, } } func (e *GenericEventConverter) normalizeMap(m MapStr, level int) (MapStr, []*KeyError) { ... for key, value := range m { v, valueErrs := e.normalizeValue(value, level+1) if len(valueErrs) > 0 { for _, err := range valueErrs { err.addKey(key) } errs = append(errs, valueErrs...) } } ... }

This replaces the "always track an array" with an int.

All in all, the change to keyError trades the good path overhead with more complexity in the error path. But errors in here are an edge case and should be addressed by the Beats input author, who produced an event that is not compatible with libbeat.

urso · 2020-12-02T12:53:15Z

libbeat/publisher/processing/processors.go

@@ -44,13 +44,13 @@ type processorFn struct {

 func newGeneralizeProcessor(keepNull bool) *processorFn {
 	logger := logp.NewLogger("publisher_processing")
+	g := common.NewGenericEventConverter(keepNull)


The g.keys is might be shared between multiple go-routines without a mutex when moving the constructor here. GenericEventConverter must be immutable in order to share it with multiple instances. With the introduction of keys as buffer, this is not the case anymore.

Even the returned function can be called from multiple go-routines (fortunately protected by a mutex I think).

yes after that g.keys is not safe anymore in a concurrent execution.. I thought GenericEventConverter is created for each pipeline and not shared by multiple instances(input I suppose?).

It maybe improper to make keys a buffer and shared. I think the gain from this change is mostly because of NewGenericEventConverter is now called once in newGeneralizeProcessor, not every time when processFn is invoked, as NewGenericEventConverter also news a logger.

Hi @urso , I would like to revert changes to g.keys, and have the only change to be this line (apart from testing code :) ). Making NewGenericEventConverter to be called only once to save the cost of "newing" the logger seems reasonable, as g is not shared and only in the scope of that processorFn.

Feel free to open another PR and mention this one from the new one (I will easily find the new PR). We can also discuss alternative implementation in a separate PRs and finally close the ones we don't like.

sounds good. :) field #22974

urso · 2020-12-02T12:59:39Z

libbeat/publisher/processing/default_test.go

+
+	fields := common.MapStr{"a": "b"}
+	for i := 0; i < b.N; i++ {
+		f := fields.Clone()


clone is expensive as well. Consider to use b.XTimer() in order to focus on the prog.Run only.

sure I will explore this a little and get back here

urso · 2020-12-02T13:01:59Z

libbeat/publisher/processing/default_test.go

+	prog, err := s.Create(beat.ProcessingConfig{}, false)
+	require.NoError(b, err)
+
+	fields := common.MapStr{"a": "b"}


Events can have different complexity, which the runtime clearly depends upon. How about:

cases := map[string]common.MapStr{ "event with single field": {"a": "b"}, } for name, event := range cases { b.Run(name, func(b *testing.B) { ... }) }

This allows us to easily add more cases now or in the future.

makes sense. will do.

urso · 2020-12-02T13:12:38Z

another edit, I tried to run filebeat and passing my own settings to it to skip normalization, it worked fine for a customized input, but failed for a normal log input, the program panics in addFields processor which tries to deepUpdate event's fields while for there is no Field initialized in event like this seems to be an internal event. Maybe it also makes sense to allow addFields to tolerant nil-fields event?

The event normalization step is about validating the event and ensuring events can correctly used by the different processors. The step also ensure that (via copying) there are no shared references that can change while the event is published. Event creation and publishing are async. Inputs or processors that share meta-data or modify common fields can lead to crashes otherwise.

As you noticed it is not always safe to disable normalization. This is why it is at the Beat level and not per input (some Beats modify events depending on your setting). Beats that disable the normalization do so because they do the "copy" themselves. For example by building the event as a go structure first and then have a final "conversation" step that copies the full event into a common.MapStr. This way there is no sharing possible.

Improving performance as you attempted here is a HUGHE improvement. All inputs and users will profit from this. From your mini-benchmark it looks like there is loads to gain here.

newly12 · 2020-12-02T14:57:12Z

Thanks a lot for the clarification!

I guess it's trade off between safety and performance, it makes sense that an event could be modified in its publish phase and this may surprise the output component, in cases a beat has multiple input sources and each behaves on their own, and this seems would eventually affect all..

urso · 2020-12-09T16:00:58Z

Closing this one on favor of: #22974

Improve event normalization performance

9b7769b

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 28, 2020

reuseable keys for GenericEventConverter

6012112

andresrc added the Team:Services (Deprecated) Label for the former Integrations-Services team label Nov 30, 2020

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 30, 2020

andresrc assigned urso Dec 1, 2020

urso reviewed Dec 2, 2020

View reviewed changes

urso closed this Dec 9, 2020

newly12 deleted the normalizerProcessor branch December 10, 2020 01:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve event normalization performance #22781

Improve event normalization performance #22781

newly12 commented Nov 28, 2020

newly12 commented Nov 30, 2020 •

edited

Loading

elasticmachine commented Nov 30, 2020

urso Dec 2, 2020

urso Dec 2, 2020

newly12 Dec 2, 2020 •

edited

Loading

newly12 Dec 5, 2020

urso Dec 7, 2020

newly12 Dec 8, 2020

urso Dec 2, 2020

newly12 Dec 2, 2020

urso Dec 2, 2020

newly12 Dec 2, 2020

urso commented Dec 2, 2020 •

edited

Loading

newly12 commented Dec 2, 2020

urso commented Dec 9, 2020

Improve event normalization performance #22781

Improve event normalization performance #22781

Conversation

newly12 commented Nov 28, 2020

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

newly12 commented Nov 30, 2020 • edited Loading

elasticmachine commented Nov 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

newly12 Dec 2, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

urso commented Dec 2, 2020 • edited Loading

newly12 commented Dec 2, 2020

urso commented Dec 9, 2020

newly12 commented Nov 30, 2020 •

edited

Loading

newly12 Dec 2, 2020 •

edited

Loading

urso commented Dec 2, 2020 •

edited

Loading