Dynamic GOGC based on ready state #1194

Dieterbe · 2018-12-29T13:04:16Z

This is a cleaned up version of #1171
See that PR for more information and rationale.

I have tested it like so:

cd docker/docker-dev-custom-cfg-kafka
docker-compose up --force-recreate -V
# wait til ready...
fakemetrics backfill --kafka-mdm-addr localhost:9092 --offset 5h --period 10s --speedup 100 --mpo 20000

after that completes, restart MT with gc-percent-not-ready settings that gradually go down.
Here's the tested settings:

i always restarted right at the change of a minute.
if i were to do this again, i would restart a few seconds after the minute change, to give it a chance to save all chunks. currently there's some missing data, but only after the instance stabilized so it doesn't really get in the way of the analysis.

snapshots:
https://snapshot.raintank.io/dashboard/snapshot/FoSdCguE9qi4eBXMZv3xxnRVW3EolcaN
https://snapshot.raintank.io/dashboard/snapshot/HsqiUL4vMEFFYf9oJDhLwve3Ibw5X248

I also used this patch:

diff --git a/cluster/node.go b/cluster/node.go
index 43bde4ee..940fded7 100644
--- a/cluster/node.go
+++ b/cluster/node.go
@@ -131,11 +131,14 @@ func (n HTTPNode) IsLocal() bool {
 // readyStateGCHandler adjusts the gcPercent value based on the node ready state
 func (n HTTPNode) readyStateGCHandler() {
        if gcPercent == gcPercentNotReady {
+               log.Infof("HANDLER. prio:%d, state:%q, total ready:%t. keeping stock GOGC", n.Priority, n.State, n.IsReady())
                return
        }
        if n.IsReady() {
+               log.Infof("HANDLER. prio:%d, state:%q, total ready:%t -> setting gc percent %d", n.Priority, n.State, true, gcPercent)
                debug.SetGCPercent(gcPercent)
        } else {
+               log.Infof("HANDLER. prio:%d, state:%q, total ready:%t -> setting gc percent %d", n.Priority, n.State, false, gcPercentNotReady)
                debug.SetGCPercent(gcPercentNotReady)
        }
 }

here's the output of the last run:

2018-12-29 12:39:19.039 [INFO] HANDLER. prio:0, state:"NodeNotReady", total ready:false -> setting gc percent 10
2018-12-29 12:39:19.040 [INFO] HANDLER. prio:0, state:"NodeReady", total ready:true -> setting gc percent 100
2018-12-29 12:39:29.040 [INFO] HANDLER. prio:4357775, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:39:39.040 [INFO] HANDLER. prio:130, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:39:49.043 [INFO] HANDLER. prio:121, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:39:59.042 [INFO] HANDLER. prio:103, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:40:09.040 [INFO] HANDLER. prio:112, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:40:19.050 [INFO] HANDLER. prio:85, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:40:29.040 [INFO] HANDLER. prio:79, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:40:39.040 [INFO] HANDLER. prio:62, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:40:49.040 [INFO] HANDLER. prio:47, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:40:59.048 [INFO] HANDLER. prio:37, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:41:09.040 [INFO] HANDLER. prio:26, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:41:19.040 [INFO] HANDLER. prio:16, state:"NodeReady", total ready:false -> setting gc percent 10
2018-12-29 12:41:29.040 [INFO] HANDLER. prio:10, state:"NodeReady", total ready:true -> setting gc percent 100
2018-12-29 12:41:39.040 [INFO] HANDLER. prio:0, state:"NodeReady", total ready:true -> setting gc percent 100
2018-12-29 12:41:49.040 [INFO] HANDLER. prio:1, state:"NodeReady", total ready:true -> setting gc percent 100

note that before it starts consuming data it briefly marks itself as ready.
this should not happen on read-only nodes (they use a warm up period) but i suspect this problem may also appear on read nodes if they don't start consuming the kafka backlog in time. and is something to investigate further/separately.
But even for write nodes we don't want this to happen as it means our gogc controls is temporarily not honored.

conclusions as far as this feature

it works
you can see with each run, cpu usage goes up, GC runs go up, cpu spent in GC goes up, and less memory is needed.
in my most extreme run, you trade in cpu 265% -> 440% (66% change), you get a memory reduction from 450MB->280MB (38%). it also comes with an ingestion decrease from 81s to 132s (+63%).
whether this is worth it, depends. I suspect a milder approach, maybe 50 is good to get started.

note: in real environments i expect the numbers to be different.

docs/config.md

woodsaj

I like this capability, but i don't like the idea of having a "Node" instance manipulating memory management tunables.

I think the code would be more maintainable if memory management was handled somewhere more central, ie its own package or in github.com/grafana/metricatank/util. Ideally, I would like to see MT have a notification system that components that can subscribe and publish to, so that any component can choose to be notified of events like the GC value changing, but that is outside the scope of this change.

We also need to write log messages when the GC value changes and probably emit a metric of what the value is (probably best to do that in github.com/grafana/metrictank/stats/memory_reporter.go

replay · 2018-12-31T12:24:53Z

emitting a metric with the current setting is a really good idea, then we can easily compare the GOGC value with all the other metrics on one dashboard and watch what effect the changes have.

Dieterbe · 2019-01-10T12:14:36Z

I think the code would be more maintainable if memory management was handled somewhere more central, ie its own package or in github.com/grafana/metricatank/util. Ideally, I would like to see MT have a notification system that components that can subscribe and publish to, so that any component can choose to be notified of events like the GC value changing, but that is outside the scope of this change.

agreed on all counts, including an event system being out of scope here.
we could move the GC setting to another package, but the reality is that currently, the Node is the de facto central place where we are currently being informed about changes in ready state and priority, as well as where we track their current values. so merely moving the GC setting wouldn't buy us much as it would still be driven by HTTPNode until we do a big refactor. hence i would keep this as is for now.

I'll add the log messages and metrics though.

Dieterbe · 2019-01-10T12:40:42Z

There doesn't seem to be a way to query the runtime for the current GOGC value (!)
closest I've found was https://golang.org/pkg/runtime/#MemStats NextGC (which is the target heap size to run the next GC cycle).
So instead of doing it from the memory reporter, We can track it with a gauge or something whenever we make changes (in the HTTPNode). I can put it in the memory.gc graphite namespace though, so it'll be along with the other metrics from the memory reporter.

woodsaj · 2019-01-10T12:59:33Z

There doesn't seem to be a way to query the runtime for the current GOGC value

if you move the GC management into a separate pacakge you can just query that package.

eg

package gc
var currentGC int
func CurrentGC() int {
...
}
func SetGC(int gc) {
...
}

then you can have the cluster pacakge call gc.SetGC() and have the stats package call gc.CurrentGC()

Dieterbe · 2019-01-10T13:02:05Z

true, but i'm trying to keep the stats package standalone and reusable for our other software.
but i guess making it depend on (import) a separate gc package isn't such a big problem.

woodsaj · 2019-01-10T13:03:56Z

What if we just set the GOGC envvar when we change the gc settings. Then just have the stats package report on what the env var is set to. That should make it much more portable.

Dieterbe · 2019-01-11T13:46:01Z

okay I added the metric and log for the GOGC change.
preview of dashboard update (check out the new row with the runtime stats including GOGC. there is some room for #1198 also )
https://snapshot.raintank.io/dashboard/snapshot/LY6e0kOtgl6j12IpkrRWTulpmcgm5Gn6?orgId=2

also rebased on top of master

@replay @woodsaj PTAL

metrictank-sample.ini

cluster/node.go

replay

LGTM with two comments, none of which are blockers.

This also fixes a bug where SingleNodeManager.SetState did not update the node's StateChange attribute

not sure why grafana is making changes all over the place, but anyway the dashboard still looks good

Dieterbe · 2019-01-11T19:44:54Z

huh.

Merging can be performed automatically once the requested changes are addressed.

pretty sure I did

@woodsaj maybe if you approve now it'll work

done

Dieterbe requested review from woodsaj and shanson7 December 29, 2018 13:08

Dieterbe mentioned this pull request Dec 29, 2018

Dynamic GC control #1171

Closed

woodsaj reviewed Dec 31, 2018

View reviewed changes

docs/config.md Outdated Show resolved Hide resolved

woodsaj previously requested changes Dec 31, 2018

View reviewed changes

Dieterbe force-pushed the dynamic-gogc branch from 46b6a99 to 714c52e Compare January 11, 2019 13:43

Dieterbe requested a review from replay January 11, 2019 13:46

replay reviewed Jan 11, 2019

View reviewed changes

metrictank-sample.ini Show resolved Hide resolved

replay reviewed Jan 11, 2019

View reviewed changes

cluster/node.go Show resolved Hide resolved

replay approved these changes Jan 11, 2019

View reviewed changes

Dieterbe changed the title ~~Dynamic gogc based on ready state~~ Dynamic GOGC based on ready state Jan 11, 2019

Dieterbe added 6 commits January 11, 2019 17:08

apply thisNode state/priority changes via central methods

ec7c253

This also fixes a bug where SingleNodeManager.SetState did not update the node's StateChange attribute

apply thisNode primary changes via central method

48c89aa

apply thisNode partition changes via central method

011566e

allow setting a custom 'GOGC' value when node is not ready

937ef8d

track GOGC via logs and metrics

0171db2

update dashboard to show new stat

bad477f

not sure why grafana is making changes all over the place, but anyway the dashboard still looks good

Dieterbe force-pushed the dynamic-gogc branch from 714c52e to bad477f Compare January 11, 2019 16:09

Dieterbe merged commit bd4c09d into master Jan 14, 2019

Dieterbe mentioned this pull request Jan 22, 2019

briefly marks self as ready when not ready yet #1210

Open

Dieterbe added this to the vnext milestone Feb 11, 2019

Dieterbe deleted the dynamic-gogc branch March 27, 2019 21:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamic GOGC based on ready state #1194

Dynamic GOGC based on ready state #1194

Dieterbe commented Dec 29, 2018 •

edited

Loading

woodsaj left a comment

replay commented Dec 31, 2018

Dieterbe commented Jan 10, 2019 •

edited

Loading

Dieterbe commented Jan 10, 2019

woodsaj commented Jan 10, 2019 •

edited

Loading

Dieterbe commented Jan 10, 2019

woodsaj commented Jan 10, 2019

Dieterbe commented Jan 11, 2019 •

edited

Loading

replay left a comment

Dieterbe commented Jan 11, 2019

Dynamic GOGC based on ready state #1194

Dynamic GOGC based on ready state #1194

Conversation

Dieterbe commented Dec 29, 2018 • edited Loading

woodsaj left a comment

Choose a reason for hiding this comment

replay commented Dec 31, 2018

Dieterbe commented Jan 10, 2019 • edited Loading

Dieterbe commented Jan 10, 2019

woodsaj commented Jan 10, 2019 • edited Loading

Dieterbe commented Jan 10, 2019

woodsaj commented Jan 10, 2019

Dieterbe commented Jan 11, 2019 • edited Loading

replay left a comment

Choose a reason for hiding this comment

Dieterbe commented Jan 11, 2019

Dieterbe commented Dec 29, 2018 •

edited

Loading

Dieterbe commented Jan 10, 2019 •

edited

Loading

woodsaj commented Jan 10, 2019 •

edited

Loading

Dieterbe commented Jan 11, 2019 •

edited

Loading