Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/scollector: add a kill switch for total memory used by scollector #1866

Merged
merged 1 commit into from
Aug 16, 2016

Conversation

gbrayut
Copy link
Contributor

@gbrayut gbrayut commented Aug 12, 2016

This should help prevent memory leaks in CGO or WMI from causing scollector to consume too much memory.

@@ -245,17 +245,27 @@ func main() {
}
collect.MaxQueueLen = conf.MaxQueueLen
}
maxMemMegaBytes := uint64(500)
maxMemMB := uint64(500)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I love the default of having the kill switch active. May surprise some people.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has always been active, but only for runtime memory. This adds a second check for total memory usage.

We were using over 1GB on many systems (around 80GB total on all systems) due to a WMI leak, which could have caused other systems to fail if we hadn't noticed.

@captncraig
Copy link
Contributor

very nice. LGTM

@gbrayut
Copy link
Contributor Author

gbrayut commented Aug 12, 2016

This only works if the process monitoring collectors are enabled, but it matches based on the os.Getpid call, so I don't think you need to be explicitly monitoring scollector.

I'm going to manually deploy this to a few systems including ny-bosun01 (largest scollector memory usage on linux in our environment) to make sure there aren't any issues. Can then merge it on Monday.

@gbrayut
Copy link
Contributor Author

gbrayut commented Aug 12, 2016

So it appears we don't log panics on Windows so I changed the error from a panic to a fatal. I also added #1867 to find a way to log panics.

The fatal error will look like this:

image

@gbrayut
Copy link
Contributor Author

gbrayut commented Aug 16, 2016

^ was just a rebase and a missing multiplier:
image

Merging to master after travisci builds

@gbrayut gbrayut merged commit aad1661 into master Aug 16, 2016
@gbrayut gbrayut deleted the scollector_mem branch August 16, 2016 04:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants