Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve equals check #22778

Merged
merged 7 commits into from
Dec 7, 2020
Merged

Improve equals check #22778

merged 7 commits into from
Dec 7, 2020

Conversation

newly12
Copy link
Contributor

@newly12 newly12 commented Nov 27, 2020

What does this PR do?

Improve equals check performance

Why is it important?

  • Improve equals check performance
  • improve log output
  • fix potentially wrong matches

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

Author's Checklist

  • [ ]

How to test this PR locally

Benchmark test

Logs

Benchmark added test case, before this change

pkg: github.com/elastic/beats/v7/libbeat/conditions
BenchmarkEquals_Check
BenchmarkEquals_Check-8   	 1529768	       829 ns/op
PASS

After

pkg: github.com/elastic/beats/v7/libbeat/conditions
BenchmarkEquals_Check
BenchmarkEquals_Check-8   	 6298933	       188 ns/op
PASS

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Nov 27, 2020
@andresrc andresrc added the Team:Services (Deprecated) Label for the former Integrations-Services team label Nov 30, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations-services (Team:Services)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Nov 30, 2020
@urso
Copy link

urso commented Dec 2, 2020

Nice!!!

Have you considered to turn equalsValue into an interface and do something like this?

type equalsValue interface {
  Check(ValuesMap) bool
}

type equalsIntValue int
type equalsStringValue string
...

func (v equalsIntValue) Check(event ValuesMap) bool {
   ...
}

alternatively one can try functions as well like:

type equalsValue func(ValuesMap) bool

func equalsIntValue(i int) equalsValue {
  return func(event ValuesMap) bool {
    ...
  }
}

equalsValue is internal, so we can play with it. Just wondering which implementation would be the fastest here.

All in all I would prefer to move the loop body into a method on equalsValue.

@newly12
Copy link
Contributor Author

newly12 commented Dec 2, 2020

Thanks for the review. Both look to be more elegant ways to me.

I was trying to implement this and notice that it seems that in both implementation equalsValue has to take an interface{} as its parameter instead of ValuesMap, because the name of the field that is used to extract the value is still saved in Equals type. Please correct me if I am wrong, and let me know if I shall move on with the interface parameter.

type Equals map[string]equalsValue

Another notice is that warning logs will not be logged anymore after this change, since for a field Equals.check does not try to extract all supported types from the value but only its wanted type, FYI in case this concerns you.

@urso
Copy link

urso commented Dec 2, 2020

Another notice is that warning logs will not be logged anymore after this change, since for a field Equals.check does not try to extract all supported types from the value but only its wanted type, FYI in case this concerns you.

Yes, this is a concern. If checks do not work as expected for all events due to type mismatches, then the logs will point this out.

I was trying to implement this and notice that it seems that in both implementation equalsValue has to take an interface{} as its parameter instead of ValuesMap, because the name of the field that is used to extract the value is still saved in Equals type.

Ah, you are right. Didn't notice that value is an interface. In that case we would have type equalsValue interface { Check(value interface{}) (bool, error) }, right? Then we can log the error if the extraction failed.

@newly12
Copy link
Contributor Author

newly12 commented Dec 3, 2020

Ah, you are right. Didn't notice that value is an interface. In that case we would have type equalsValue interface { Check(value interface{}) (bool, error) }, right? Then we can log the error if the extraction failed.

yeah, I think the difference would be 1. previously warning logs is about all type are mismatched 2. now it warns if the value is not its wanted type. For example, if I have a config "strField": "foo", I got a event with "strField": 1, previously there is no warning log, because the value will be extracted as int, though it doesn't equal "foo", now there will be a warning, we expect string but we have a int extracted.

The result is still the same though.

@urso
Copy link

urso commented Dec 3, 2020

now it warns if the value is not its wanted type. For example, if I have a config "strField": "foo", I got a event with "strField": 1, previously there is no warning log, because the value will be extracted as int, though it doesn't equal "foo", now there will be a warning, we expect string but we have a int extracted.

I would say this is an improvement :)

Plus the old code did not take the type in the config in mind when comparing the results. E.g. if ExtractInt on the event did succeed, it would have compared the result with the integer value 0, if the actual setting in the config was a string. This could result in wrong true/false results.

The change will: improve performance, improve log output, fix potentially wrong matches.

@newly12
Copy link
Contributor Author

newly12 commented Dec 4, 2020

Hi,

I've added both implementation in the code for now for your reference, the benchmark result is attached below, overall it seems the implementation with new type wrapping int/str/bool has better performance, but I am a bit confused why it shows a lower than others when there are less checks, which I couldn't figure out at this moment..

BenchmarkEqualsWith1Conditions
BenchmarkEqualsWith1Conditions-8    	14561558	        77.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals2With1Conditions
BenchmarkEquals2With1Conditions-8   	13009749	        85.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals3With1Conditions
BenchmarkEquals3With1Conditions-8   	14077786	        79.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkEqualsWith3Conditions
BenchmarkEqualsWith3Conditions-8    	 5666799	       205 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals2With3Conditions
BenchmarkEquals2With3Conditions-8   	 3171140	       384 ns/op	     216 B/op	       4 allocs/op
BenchmarkEquals3With3Conditions
BenchmarkEquals3With3Conditions-8   	 6365893	       195 ns/op	       0 B/op	       0 allocs/op
BenchmarkEqualsWith5Conditions
BenchmarkEqualsWith5Conditions-8    	 2231367	       543 ns/op	     245 B/op	       5 allocs/op
BenchmarkEquals2With5Conditions
BenchmarkEquals2With5Conditions-8   	 3334934	       337 ns/op	     197 B/op	       3 allocs/op
BenchmarkEquals3With5Conditions
BenchmarkEquals3With5Conditions-8   	 3765338	       470 ns/op	     245 B/op	       5 allocs/op
BenchmarkEqualsWith7Conditions
BenchmarkEqualsWith7Conditions-8    	 1584600	       674 ns/op	     315 B/op	       6 allocs/op
BenchmarkEquals2With7Conditions
BenchmarkEquals2With7Conditions-8   	 3286808	       404 ns/op	     221 B/op	       4 allocs/op
BenchmarkEquals3With7Conditions
BenchmarkEquals3With7Conditions-8   	 2253856	       486 ns/op	     245 B/op	       5 allocs/op

@urso
Copy link

urso commented Dec 4, 2020

It looks like the performance depends on the number of allocations. I modified your benchmark function to GC before a run and reset the timer, such that we only test the Check itself:

func benchmarkEquals(b *testing.B, f factory, fields map[string]interface{}) {
	e, err := f(fields)
	assert.NoError(b, err)

	runtime.GC()
	b.ResetTimer()
	for i := 0; i < b.N; i++ {
		e.Check(secdTestEvent)
	}
}

With this change the numbers for comparisons that did not fail like BenchmarkEquals.*With1 have been more consistent on my machine, even if I did run the complete suite. for example (without one or the other test did sometimes go up by 20ns on my machine):

$ go test -bench 'BenchmarkEquals.*With1' -benchmem
goos: linux
goarch: amd64
pkg: github.com/elastic/beats/v7/libbeat/conditions
BenchmarkEqualsWith1Conditions-32     	17529830	        66.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals2With1Conditions-32    	17563389	        67.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals3With1Conditions-32    	17860872	        65.9 ns/op	       0 B/op	       0 allocs/op
PASS

All in all the solution that allocates less will be the fastest:

$ go test -bench 'Benchmark' -benchmem
goos: linux
goarch: amd64
pkg: github.com/elastic/beats/v7/libbeat/conditions
BenchmarkSimpleCondition-32           	59731610	        19.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkCombinedCondition-32         	 9768402	       124 ns/op	       0 B/op	       0 allocs/op
BenchmarkEqualsWith1Conditions-32     	19837894	        58.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals2With1Conditions-32    	18626834	        61.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals3With1Conditions-32    	20692213	        60.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkEqualsWith3Conditions-32     	 6081150	       166 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals2With3Conditions-32    	 2765434	       570 ns/op	     216 B/op	       4 allocs/op
BenchmarkEquals3With3Conditions-32    	 6134802	       164 ns/op	       0 B/op	       0 allocs/op
BenchmarkEqualsWith5Conditions-32     	 1362459	       885 ns/op	     245 B/op	       5 allocs/op
BenchmarkEquals2With5Conditions-32    	 2274757	       566 ns/op	     221 B/op	       4 allocs/op
BenchmarkEquals3With5Conditions-32    	 1475354	       748 ns/op	     245 B/op	       5 allocs/op
BenchmarkEqualsWith7Conditions-32     	 1000000	      1099 ns/op	     315 B/op	       6 allocs/op
BenchmarkEquals2With7Conditions-32    	 1965396	       656 ns/op	     221 B/op	       4 allocs/op
BenchmarkEquals3With7Conditions-32    	 1529449	       771 ns/op	     245 B/op	       5 allocs/op
BenchmarkNetworkCondition-32          	 3951658	       287 ns/op	      16 B/op	       1 allocs/op
PASS

Equals2 was faster when it did allocate once, but for the With3Conditions test it was slower because it has had 4 allocations. Time to profile:

$ go test -bench 'BenchmarkEquals2With3Cond' -benchmem -memprofile mem.out -memprofilerate 1 && go tool pprof -http :6060 mem.out

This command will open an interactive window in your browser that allows you to inspect the run. In the grapgh click on equalsBoolValue to select the node and go to View -> Source in the menu. Now we see where the expensive allocation comes from:

    152            .          .           func (e equalsBoolValue) Check(value interface{}) bool { 
    153            .          .           	if bValue, err := ExtractBool(value); err != nil { 
    154            .          .           		return bValue == bool(e) 
    155            .          .           	} 
    156            .    26.75MB           	logp.L().Named(logName).Warnf("expected bool but got type %T in equals condition.", value) 
    157            .          .           	return false 
    158            .          .           } 

This is funny. Why does the Equals2 check logs a message and the others not?

The logp.L().Named(...) is somewhat expensive. Even more than the Warnf it seems. Idealy we would create the logger in NewEqualsCondition, and the store it in the Equal type. For Equal2/3 the same. We can either pass the logger as paramater into the check function, or have the check function return an error that gets printed. The Warnf statements might still allocated for passing the parameters and for the temporary buffer the log message is written into, before it is written.

The performance between good path (types match) and bad path (type mismatch) seems to differ quite heavily thanks due to the extra allocations. Looking at the runs without allocations I'd say it doesn't make a big difference which of the three solutions we use, them seem to be similar performance wise:

my run:

BenchmarkEqualsWith1Conditions-32     	19837894	        58.5 ns/op
BenchmarkEquals2With1Conditions-32    	18626834	        61.7 ns/op	
BenchmarkEquals3With1Conditions-32    	20692213	        60.5 ns/op
...
BenchmarkEqualsWith3Conditions-32     	 6081150	       166 ns/op
BenchmarkEquals3With3Conditions-32    	 6134802	       164 ns/op

a few ns/op difference is negligible and looks like noise.

I'd concentrate on the "good" path only first (just update the benchmark, such that no type errors are printed), select one of these implementations, and finally use the benchmark and memory profile to reduce allocations in the bad path.

Just for fun (equals 3 with allocating the logger upfront per equal function):

func equalsIntValue3(i uint64) equalsValueFunc {
	log := logp.L().Named(logName) // <- better to have the logger passed as parameter
	return func(value interface{}) bool {
		if sValue, err := ExtractInt(value); err == nil {
			return sValue == i
		}
		log.Warnf("expected int but got type %T in equals condition.", value)
		return false
	}
}

func equalsStringValue3(s string) equalsValueFunc {
	log := logp.L().Named(logName)
	return func(value interface{}) bool {
		if sValue, err := ExtractString(value); err == nil {
			return sValue == s
		}
		log.Warnf("expected string but got type %T in equals condition.", value)
		return false
	}
}

func equalsBoolValue3(b bool) equalsValueFunc {
	log := logp.L().Named(logName)
	return func(value interface{}) bool {
		if sValue, err := ExtractBool(value); err == nil {
			return sValue == b
		}
		log.Warnf("expected bool but got type %T in equals condition.", value)
		return false
	}
}

running the benchmark I get:

BenchmarkEqualsWith1Conditions-32     	19428231	        60.5 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals2With1Conditions-32    	18376104	        63.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals3With1Conditions-32    	17106399	        69.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkEqualsWith3Conditions-32     	 8174004	       150 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals2With3Conditions-32    	 2119209	       518 ns/op	     216 B/op	       4 allocs/op
BenchmarkEquals3With3Conditions-32    	 7012340	       170 ns/op	       0 B/op	       0 allocs/op
BenchmarkEqualsWith5Conditions-32     	 2335826	       873 ns/op	     245 B/op	       5 allocs/op
BenchmarkEquals2With5Conditions-32    	 3561202	       573 ns/op	     221 B/op	       4 allocs/op
BenchmarkEquals3With5Conditions-32    	 3564328	       443 ns/op	      56 B/op	       1 allocs/op
BenchmarkEqualsWith7Conditions-32     	 1000000	      1087 ns/op	     315 B/op	       6 allocs/op
BenchmarkEquals2With7Conditions-32    	 2052055	       541 ns/op	     186 B/op	       3 allocs/op
BenchmarkEquals3With7Conditions-32    	 2262382	       526 ns/op	      56 B/op	       1 allocs/op

The single allocation is due the logger doing fmt.Sprintf and can't be removed.

@newly12
Copy link
Contributor Author

newly12 commented Dec 5, 2020

oops this is my fault.. I debugged equals2 finally got why it hits the logger statement in 3Condition test of equals2, it should be err == nil here. 🤦‍♂️

	if bValue, err := ExtractBool(value); err != nil {

and after that I am getting expected output

BenchmarkEqualsWith1Conditions
BenchmarkEqualsWith1Conditions-8    	14012263	        75.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals2With1Conditions
BenchmarkEquals2With1Conditions-8   	15052548	        79.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals3With1Conditions
BenchmarkEquals3With1Conditions-8   	15445071	        77.9 ns/op	       0 B/op	       0 allocs/op
BenchmarkEqualsWith3Conditions
BenchmarkEqualsWith3Conditions-8    	 5918541	       190 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals2With3Conditions
BenchmarkEquals2With3Conditions-8   	 6046137	       202 ns/op	       0 B/op	       0 allocs/op
BenchmarkEquals3With3Conditions
BenchmarkEquals3With3Conditions-8   	 6471944	       188 ns/op	       0 B/op	       0 allocs/op
BenchmarkEqualsWith5Conditions
BenchmarkEqualsWith5Conditions-8    	 2222293	       553 ns/op	     245 B/op	       5 allocs/op
BenchmarkEquals2With5Conditions
BenchmarkEquals2With5Conditions-8   	 3954058	       460 ns/op	     245 B/op	       5 allocs/op
BenchmarkEquals3With5Conditions
BenchmarkEquals3With5Conditions-8   	 2304970	       462 ns/op	     245 B/op	       5 allocs/op
BenchmarkEqualsWith7Conditions
BenchmarkEqualsWith7Conditions-8    	 1786755	       683 ns/op	     315 B/op	       6 allocs/op
BenchmarkEquals2With7Conditions
BenchmarkEquals2With7Conditions-8   	 2730859	       500 ns/op	     245 B/op	       5 allocs/op
BenchmarkEquals3With7Conditions
BenchmarkEquals3With7Conditions-8   	 2750624	       468 ns/op	     245 B/op	       5 allocs/op

Now seems equals3 is the choice, and as you have mentioned, logp.L().Named(...) is expensive, the way to avoid keeping getting the logger would be get it once at the time getting the "equals" instance.

I have updated the code to adopt equals3 and cleaned other code. Please have a review. Below is the new benchmark with the GC and reset timer prior to the benchmark loop.

goos: darwin
goarch: amd64
pkg: github.com/elastic/beats/v7/libbeat/conditions
BenchmarkSimpleCondition-8      47911032                22.7 ns/op             0 B/op          0 allocs/op
BenchmarkCombinedCondition-8     9228928               131 ns/op               0 B/op          0 allocs/op
BenchmarkEquals/1_condition-8   16472109                71.7 ns/op             0 B/op          0 allocs/op
BenchmarkEquals/3_conditions-8   6496636               188 ns/op               0 B/op          0 allocs/op
BenchmarkEquals/5_conditions-8   5979784               304 ns/op              56 B/op          1 allocs/op
BenchmarkEquals/7_conditions-8   3703812               305 ns/op              56 B/op          1 allocs/op
BenchmarkNetworkCondition-8      7482273               164 ns/op              16 B/op          1 allocs/op
PASS

It's been very helpful, I really appreciate it!

@elasticmachine
Copy link
Collaborator

elasticmachine commented Dec 5, 2020

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview

Expand to view the summary

Build stats

  • Build Cause: urso commented: jenkins run the tests please

  • Start Time: 2020-12-07T16:41:32.847+0000

  • Duration: 59 min 21 sec

Test stats 🧪

Test Results
Failed 0
Passed 17337
Skipped 1373
Total 18710

💚 Flaky test report

Tests succeeded.

Expand to view the summary

Test stats 🧪

Test Results
Failed 0
Passed 17337
Skipped 1373
Total 18710

@newly12 newly12 changed the title Improve equals check performance Improve equals check Dec 5, 2020
@urso
Copy link

urso commented Dec 7, 2020

jenkins run the tests please

@urso
Copy link

urso commented Dec 7, 2020

The change looks good to me.

Thank you, this is a really great improvement in the hot code paths!

Do you by any chance have a benchmark run (with the new benchmarks) for the old code? I'd love to see how much performance has improved. You might want to check out benchcmp (e.g. see this blog post, not by me, just a random google search)

I started CI and will merge and port the change to the release branches if CI goes green.

@urso urso merged commit 5f3d8ac into elastic:master Dec 7, 2020
@urso urso added the needs_backport PR is waiting to be backported to other branches. label Dec 7, 2020
@urso
Copy link

urso commented Dec 7, 2020

Merged. The improvement should become available with the 7.11 release.

Thanks for the contribution! Performance improvements like this are really exciting!

@newly12
Copy link
Contributor Author

newly12 commented Dec 8, 2020

Thanks @urso . Please check comparison below.

$ benchstat old.txt new.txt                                                
name                   old time/op    new time/op    delta
SimpleCondition-8        22.5ns ± 7%    22.7ns ± 9%      ~     (p=0.525 n=18+18)
CombinedCondition-8       136ns ±16%     136ns ± 8%      ~     (p=0.146 n=20+17)
Equals/3_conditions-8     790ns ± 3%     193ns ± 8%   -75.52%  (p=0.000 n=18+19)
Equals/5_conditions-8    1.25µs ±21%    0.33µs ±12%   -73.88%  (p=0.000 n=16+16)
Equals/7_conditions-8     951ns ±22%     310ns ±11%   -67.36%  (p=0.000 n=20+16)
Equals/1_condition-8      270ns ± 5%      77ns ±16%   -71.34%  (p=0.000 n=17+20)
NetworkCondition-8        170ns ± 5%     171ns ± 7%      ~     (p=0.640 n=17+18)

name                   old alloc/op   new alloc/op   delta
SimpleCondition-8         0.00B          0.00B           ~     (all equal)
CombinedCondition-8       0.00B          0.00B           ~     (all equal)
Equals/3_conditions-8      192B ± 0%        0B       -100.00%  (p=0.000 n=20+20)
Equals/5_conditions-8      479B ± 4%       56B ± 0%   -88.31%  (p=0.000 n=17+18)
Equals/7_conditions-8      329B ±14%       56B ± 0%   -82.97%  (p=0.000 n=20+17)
Equals/1_condition-8      64.0B ± 0%      0.0B       -100.00%  (p=0.000 n=20+20)
NetworkCondition-8        16.0B ± 0%     16.0B ± 0%      ~     (all equal)

name                   old allocs/op  new allocs/op  delta
SimpleCondition-8          0.00           0.00           ~     (all equal)
CombinedCondition-8        0.00           0.00           ~     (all equal)
Equals/3_conditions-8      6.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
Equals/5_conditions-8      11.9 ± 7%       1.0 ± 0%   -91.58%  (p=0.000 n=17+19)
Equals/7_conditions-8      8.20 ±15%      1.00 ± 0%   -87.80%  (p=0.000 n=20+20)
Equals/1_condition-8       2.00 ± 0%      0.00       -100.00%  (p=0.000 n=20+20)
NetworkCondition-8         1.00 ± 0%      1.00 ± 0%      ~     (all equal)

@newly12 newly12 deleted the equal branch December 8, 2020 02:30
@urso urso added v7.11.0 and removed needs_backport PR is waiting to be backported to other branches. labels Dec 9, 2020
urso pushed a commit to urso/beats that referenced this pull request Dec 15, 2020
(cherry picked from commit 5f3d8ac)
urso pushed a commit that referenced this pull request Dec 16, 2020
(cherry picked from commit 5f3d8ac)

Co-authored-by: Peter Deng <yundeng@ebay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Services (Deprecated) Label for the former Integrations-Services team v7.11.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants