Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance Glob matching by using compiled Globs #554

Open
galen0624 opened this issue Sep 20, 2018 · 14 comments
Open

Enhance Glob matching by using compiled Globs #554

galen0624 opened this issue Sep 20, 2018 · 14 comments

Comments

@galen0624
Copy link
Collaborator

Glob Matching today has a performance hit when Fabio has a lot of services (>500). This can impact CPU performance significantly. As described in the performance section of the vendored package.
https://github.com/gobwas/glob

I have most of the code ready for using compiled Globs. I will PR once I get through some of the load testing.

@galen0624
Copy link
Collaborator Author

galen0624 commented Sep 21, 2018

It doesn't appear that the compiled globs helped the CPU issue all that much. My testing details are below. I'm going to run a CPU Profile at high load and post the details.
Load test on an 8core 8G VM.

Sep 20 21:46:49 fabio: "GlobMatchingDisabled": true,
Sep 20 21:46:49 fabio: "GlobCacheSize": 5000

Test 1 - 10K TPS 37% CPU

Sep 20 21:51:26 fabio: "GlobMatchingDisabled": false,
Sep 20 21:51:26 fabio: "GlobCacheSize": 5000

Test 2 - 2K TPS 30% CPU
Test 3 - 3K TPS 38% CPU
Test 4 - 4K TPS 98% CPU (TPS didn't exceed 3.7K)
Test 5 - 10K TPS 98% CPU (TPS didn't exceed 4.0K)

issue-554-performance-test

@galen0624
Copy link
Collaborator Author

galen0624 commented Sep 21, 2018

We have about 1500 Services in this ENV.

glob.cache.size = 5000
profile.mode = cpu
profile.path = /tmp
5K TPS Test

Based on the CPU profile (See below) it looks like the RLock and RUnlock are taking most of the CPU. I'm not sure how to avoid using those without creating race conditions and panics.
Another func we could look at is the stings.ToLower but that is tertiary to the Mutex locks

cpu1.pdf

@galen0624
Copy link
Collaborator Author

galen0624 commented Sep 21, 2018

I will PR the current code to double check my logic.

@galen0624
Copy link
Collaborator Author

Is this something we could look into - https://golang.org/doc/go1.9#sync-map

@galen0624
Copy link
Collaborator Author

galen0624 commented Sep 26, 2018

If we moved the compiled glob cache add/remove functions to the Route add/del functions using the Mutex locks we in theory should be able to remove the fast path lookup mutex Rlock/UNlock. That way we move the CPU intensive part (Rlock/RUnlock) to the "control plane" side of Fabio instead of the "data plane" .

Thoughts?

@galen0624
Copy link
Collaborator Author

Working through the previous comment I found that the read lock is still needed. There is another option sync.Map.

https://medium.com/@deckarep/the-new-kid-in-town-gos-sync-map-de24a6bf7c2c

Going to try and test that other method.

@aaronhurt
Copy link
Member

I've used the Sync.Map implementation successfully for a few high transaction environments. It works well when you have few updates but many concurrent reads which should be the case here. Will be interesting to see the results.

@galen0624
Copy link
Collaborator Author

@leprechau How have you dealt with the sync.Map.Load() returning an interface{}. I tried to type cast the return to a glob.Glob but the system panics.
I probably am not going about this the correct way.

	// fast path with read lock
	//c.mu.RLock()
	g, _ := c.m.Load(pattern)
	gr := g.(glob.Glob)

	//c.mu.RUnlock()
	if gr != nil {
		return gr, nil
	}

interface conversion: interface is nil, not glob.Glob

@aaronhurt
Copy link
Member

You probably want to do something like ...

    var val interface{}  // hold onto it after the if block
    if val, ok := mymap.Load(pattern); !ok {
        // value didn't load ... bail?
    }
    var obj my.Type  // same thing
    if obj, ok := val.(my.Type); !ok {
        // probably shouldn't happen ... bail
    }
    // you should have a valid obj of correct type to work with here

@galen0624
Copy link
Collaborator Author

galen0624 commented Sep 27, 2018

OK I tested using the sync.Map and it seems to be much better. I was able to achieve 6500 TPS. At 7000 TPS it would start dropping connections. The below test went from 1k,2k,3k,4k,5k,6k,7, 6.5k TPS. 8 core 8 gig fabio VM
"GlobMatchingDisabled": false,
"GlobCacheSize": 5000

See graph. I'll PR the new code for review.

issue-554-glob-cahche-using-syncmap

@galen0624
Copy link
Collaborator Author

@leprechau I could use some advice on how to handle some type assertions. The Glob is an interface and there are 21 types that implement the interface. When the glob compiles it returns that type but the sync.Map returns the interface{}. I have to find a way to assert to the specific type. I could put in a switch statement with the 21 types but that seems inefficient. Is there a way to cast the type at return? something like this return glb.(reflect.TypeOf(glb)) That snippet doesn't work BTW.

@galen0624
Copy link
Collaborator Author

#555 This code is complete, load tested, make test passes and ready for merge.

@galen0624
Copy link
Collaborator Author

@leprechau @magiconair Any thoughts on this PR? Should I update/change anything?

@aaronhurt
Copy link
Member

@galen0624 Left a comment in the PR.

aaronhurt added a commit that referenced this issue Jan 29, 2020
* consul: refactor service monitor

Refactor the set of functions which watch the consul state
and generate the route commands into a set of objects to make
them testable and extendable.

* consul: move build route command logic to separate object

... and finally add some tests.

* consul: fetch route updates concurrently

The code which updates the routing table from consul was using a
single go routine to fetch data from consul. This can be a slow
process if consul has lots of registered services.

This patch adds an option `registry.consul.serviceMonitors`
to increase the concurrency for the route updates.

* issue - 558 updated to include custom http.Transport

* added global polling interval for issue 558

* issue 558 updates and doco adds

* added compiled glob matching cache using LRU cache

* updated glob_cache to use sync.Map

* issue #554 refactored to use sync.Map instead of RWmutex locks

* updated glob_cache to support type casting of glob.Glob Interface

* added type assertion of glob.Glob interface instead of type switch

* rebased to master and fixed issue with load_test

* added Glob to grpc proxy

* removing polling interval

* updated tests and remove PollingInterval

Co-authored-by: Frank Schröder <frank.schroeder@gmail.com>
Co-authored-by: Aaron Hurt <ahurt@anbcs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants