-
Notifications
You must be signed in to change notification settings - Fork 105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this capability, but i don't like the idea of having a "Node" instance manipulating memory management tunables.
I think the code would be more maintainable if memory management was handled somewhere more central, ie its own package or in github.com/grafana/metricatank/util
. Ideally, I would like to see MT have a notification system that components that can subscribe and publish to, so that any component can choose to be notified of events like the GC value changing, but that is outside the scope of this change.
We also need to write log messages when the GC value changes and probably emit a metric of what the value is (probably best to do that in github.com/grafana/metrictank/stats/memory_reporter.go
emitting a metric with the current setting is a really good idea, then we can easily compare the GOGC value with all the other metrics on one dashboard and watch what effect the changes have. |
agreed on all counts, including an event system being out of scope here. I'll add the log messages and metrics though. |
There doesn't seem to be a way to query the runtime for the current GOGC value (!) |
if you move the GC management into a separate pacakge you can just query that package. eg
then you can have the cluster pacakge call gc.SetGC() and have the stats package call gc.CurrentGC() |
true, but i'm trying to keep the stats package standalone and reusable for our other software. |
What if we just set the GOGC envvar when we change the gc settings. Then just have the stats package report on what the env var is set to. That should make it much more portable. |
46b6a99
to
714c52e
Compare
okay I added the metric and log for the GOGC change. also rebased on top of master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with two comments, none of which are blockers.
This also fixes a bug where SingleNodeManager.SetState did not update the node's StateChange attribute
not sure why grafana is making changes all over the place, but anyway the dashboard still looks good
714c52e
to
bad477f
Compare
huh.
pretty sure I did @woodsaj maybe if you approve now it'll work |
This is a cleaned up version of #1171
See that PR for more information and rationale.
I have tested it like so:
after that completes, restart MT with gc-percent-not-ready settings that gradually go down.
Here's the tested settings:
i always restarted right at the change of a minute.
if i were to do this again, i would restart a few seconds after the minute change, to give it a chance to save all chunks. currently there's some missing data, but only after the instance stabilized so it doesn't really get in the way of the analysis.
snapshots:
https://snapshot.raintank.io/dashboard/snapshot/FoSdCguE9qi4eBXMZv3xxnRVW3EolcaN
https://snapshot.raintank.io/dashboard/snapshot/HsqiUL4vMEFFYf9oJDhLwve3Ibw5X248
I also used this patch:
here's the output of the last run:
note that before it starts consuming data it briefly marks itself as ready.
this should not happen on read-only nodes (they use a warm up period) but i suspect this problem may also appear on read nodes if they don't start consuming the kafka backlog in time. and is something to investigate further/separately.
But even for write nodes we don't want this to happen as it means our gogc controls is temporarily not honored.
conclusions as far as this feature
whether this is worth it, depends. I suspect a milder approach, maybe 50 is good to get started.
note: in real environments i expect the numbers to be different.