Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issue - Glob matching with large number of services in consul #548

Closed
murphymj25 opened this issue Sep 12, 2018 · 13 comments
Closed
Milestone

Comments

@murphymj25
Copy link
Contributor

While testing SSL performance in one of our application dev environments, we noticed that we were not getting anywhere close the the requests per second volumes that we saw when we originally did the performance testing in the lab. In the lab we were seeing about 2000 requests per second per core, in dev, we were seeing less than 1000 requests per second on an 8 core node. Upon further investigation we confirmed that the performance was poor only when fabio was getting routes from a consul environment with a large number of services (our dev environment currently has about 2200 routes).

We did a pprof on the CPU and confirmed that github com/fabiolb/fabio/vendor/github com/gobwas/glob/ was the source of the high utilization. The pdf is attached below.

We are currently running 1.5.9; it looks like this performance may be related to #457. In that PR, it looks like the benchmark does show a decrease in performance compared to the glob/ryanuber-8 if gobwas/glob isn't compiled. This is further documented in the performance section of https://github.com/gobwas/glob.

In route/table.go We confirmed that Fabio is not using compiled Globs.

// matchingHosts returns all keys (host name patterns) from the
// routing table which match the normalized request hostname.
func (t Table) matchingHosts(req *http.Request) (hosts []string) {
    host := normalizeHost(req.Host, req.TLS != nil)
    for pattern := range t {
        normpat := normalizeHost(pattern, req.TLS != nil)
        g := glob.MustCompile(normpat)
        if g.Match(host) {
            hosts = append(hosts, pattern)
        }
    }
--SNIP
}

We rolled back to 1.5.8 and confirmed that the performance is much better. We are actively looking to either roll back to the glob package used in 1.5.8 or update the current route table to be compiled. Any thoughts on what would need to be updated to move forward with the compiled option? We'd be willing to submit the PR.

file.pdf

@galen0624

@aaronhurt
Copy link
Member

aaronhurt commented Sep 12, 2018

I'd like to see the table expanded to include pre-compiled matches. This would allow those to be used elsewhere and fix additional issues like SNI wildcard matching.

@galen0624
Copy link
Collaborator

galen0624 commented Sep 12, 2018

So something like below? I know the code wont work as is but just an example.

var NormalizedPatternGlobs []glob.Glob

// matchingHosts returns all keys (host name patterns) from the
// routing table which match the normalized request hostname.
func (t Table) matchingHosts(req *http.Request) (hosts []string) {
	host := normalizeHost(req.Host, req.TLS != nil)
	for pattern := range t {
		normpat := normalizeHost(pattern, req.TLS != nil)
                compiledGlobMatch(
		         if g.Match(host) {
			hosts = append(hosts, pattern)
		}
	}
--SNIP
}

func compiledGlobMatch(pattern string) glob.Glob {
            for i, v := range NormalizedPatternGlobs {
                 //Not sure exactly how to do the comparison 
                 if pattern == v {
                    return v
                 } else {
                   //not exactly sure the compile command yet
                  g:=glob.Compile(pattern)
                  //Memory leak here with no clean up
                  NormalizedPatternGlobs = append(NormalizedPatternGlobs, g)
                }

   }
}

@aaronhurt
Copy link
Member

aaronhurt commented Sep 12, 2018

Was just in the middle of a long response ... and you're definitely on the right track.

@aaronhurt
Copy link
Member

The current Table type is a simple map of Routes which is a slice of Route structs. To avoid compiling matches in-real-time we need to maintain a map of pre-compiled matches that can cross-referenced when looping over the table. This should be populated whenever an entry is added to the table. That could be stored separately from the Table type or part of a new object that would be referenced by the current Table map.

Thoughts? I'm very open but would like to think it through.

@aaronhurt
Copy link
Member

In the immediate example I'm afraid you could still have a penalty at least the first time the table is queried as population of the NormalizedPatternGlobs would still be done in real-time the first time the route is hit. I think it would be better if that could be done when the route is added to the table.

@aaronhurt
Copy link
Member

This could also alleviate the reasoning behind not implementing glob matches on SNI as being discussed in #547

@galen0624
Copy link
Collaborator

OK let me see what we can do over the next few days. At this point we have to fix our ENV. We are planning on moving to the previous package and redeploying. We will work on compiled globs and submit a PR after that. Also looks like Fabio moved to gomod. Have you had any issues with the experimental package?

To answer your other question -- Yes I think as routes get added the table should be updated. That makes the most sense. Also cleanup when a route is removed gets rid of the memory leak potential.

@galen0624
Copy link
Collaborator

FYI - After migrating to the previous glob package the performance issue disappeared. CPU usage with 4k TPS was around 30-40% with an 8 core VM. We will monitor over the next few days and then start working on an update to handle this situation in a more performant way.

@galen0624
Copy link
Collaborator

@leprechau @murphymj25
Update - After working through our requirements we decided that Glob matching isn't required for our configuration. I have updated Fabio to include a config item glob.matching.enabled (default true) so we have the option to disable it and use standard string compare. This will provide the best performance for our config. We are currently load testing with/without GLOB matching and will update this Issue with the results. I will submit the PR with the changes once I get the Go Tests updated either today or early next week.

@aaronhurt
Copy link
Member

Awesome, thank you.

@murphymj25
Copy link
Contributor Author

murphymj25 commented Sep 14, 2018

I have just completed testing our update with the switch for glob matching. For the test, the fabio load balancer has 8 cores and 8GB of memory. In this environment Fabio currently has 2400 routes. For testing we used wrk2 with a connection close header set to prevent keepalive connections.

Test 1:
Volume: 1,000 TPS
Duration: 90 seconds
Setting: Glob Enabled
Test 1 Result: CPU 70-72%

Test 2:
Volume: 10,000 TPS
Duration: 90 seconds
Setting: Glob Disabled
Test 2 Result: CPU 39-44%

I also uploaded a screenshot of our test results, it's a little difficult to see the request volumes because of the large difference in volume between the two tests, but you can clearly see the difference in CPU.

We've also verified that the older glob package (glob/ryanuber-8) performed significantly better than the new glob package (gobwas/glob). Based on #457, a compiled gobwas/glob might get performance closer to what we are seeing with glob disabled.

@galen0624 feel free to add anything i might have missed.

screen shot 2018-09-14 at 12 37 48 pm

@magiconair
Copy link
Contributor

Yeah, I somewhat suspected this. I've left some comments on the PR and I think that you can cache the compiled globs. See glob cache code in the PR.

magiconair added a commit that referenced this issue Sep 20, 2018
…g-comp

Issue #548 added enable/disable glob matching
@magiconair magiconair added this to the 1.5.10 milestone Sep 20, 2018
@magiconair
Copy link
Contributor

Closed by #550

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants