-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leak in go-metrics library #530
Comments
We disabled Metrics and let Fabio run for 24 hours. The memory consumption was level. |
Thank you for that detailed analysis. I'll try to dig in and review the metrics code and see if I can find the leak over the next few days. |
Doing a bit more digging I think this relates to #476 and provides a bit of a push to get that going. |
I agree with #476 being pushed up. I think it's slated for 1.6. We would rather see the raw metrics instead of rolled up ones. I was looking into rcrowley/go-metrics and my colleague and I are thinking it's related to the churn of node health. We have over 1200 services registered in Consul so the node health is always updating. Meaning the blocking query from Fabio to Consul is sort of pointless in our ENV because the Index is always being updated. This leads to the metric sets being updated constantly. We are wondering if the go-metrics doesn't clean up the old sets if a node health refreshes. |
I agree it's related to go-metrics. I was looking through their issue feed and found a few older issues that may be related: rcrowley/go-metrics#201 Other project referenced in go-metrics issues ... The upstream fixes in 197 and 206 look to be added to a later version of go-metrics than is currently vendored into fabio. We may be able to get some resolution of the leak by updating the vendored version. |
I vendored in the latest from rcrowley/go-metrics and turned metrics back on. I'll watch the memory consumption over the next day and update the issue with findings. I'll PR my fork/branch if it looks good and Fabio performs as expected.
|
…revision:e2704e165165ec55d062f5919b4b29494e9fa790. Issue - fabiolb#530
After Vendoring in the latest go-metrics it appears that the memory leak is still persistent. Digging further into the issue the NewMeter() function is consuming most of the memory. The "constructor" function has a new comment stipulating that
Looking through the Fabio Code I don't see where the Metrics Stop() func is called. If Stop isn't called then our assumption about route/node health churn might hold true based on the comments in the NewMeter function. I'm wondering if that can be added to the |
Right, that was in-line with what I was reading. The |
Digging further it looks like the One of the new method signatures |
I definitely think you're on the right path. |
I was totally wrong on updating the gmRegistry struct as that implements the fabio Metrics Registry not the rcwroley/go-metrics Registry. Looks like the appropriate spot to do this is in |
That sounds logical. Unregistering the service should definitely stop collecting metrics on said service. |
So it already does that but I don't see where the We may have to add it in addition to the Unregister. |
Again I was wrong with my previous statement. Unregister eventually does call a metrics/registry.go
And route/table.go
I redeployed it in the lab to see if this has any impact. I'll update tomorrow if it does. |
…ackage to version - revision:e2704e165165ec55d062f5919b4b29494e9fa790
I realized earlier today that I had a mistake in my GOPATH. I am currently re-testing the updated |
Awesome, great news. Thank you for all the work and analytics on this issue. |
Thanks @galen0624 for the digging and the subsequent fix. Time to pick up the slack on fabio again. I need to back out the FastCGI change which currently breaks the build and then we can push a new release. |
No worries. Glad to help. |
We are running Fabio 1.5.9 in a production environment that has 1,200+ services. We have observed what appears to be a memory leak (see graph of example). The graph displays a lab ENV but in production the growth rate is the same. We have to redeploy/restart the Fabio instance every couple of days due to this on going issue. If the memory consumption grows above 60% (8G total on the box) Fabio starts to experience delays in route updates over 1 min.
We did a pprof memory profile of Fabio (5 min and 12 hour). The profile seems to indicate that the issue is within Metrics. See attached. We are currently in the process of running Fabio without Metrics to see if that has any impact. I will update when I gather more data.
mem-5min.pdf
mem-12h.pdf
@murphymj25
The text was updated successfully, but these errors were encountered: