Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory (and Handle) leak on Windows Server 2016 #140

Open
safebear opened this issue Dec 16, 2020 · 10 comments
Open

Memory (and Handle) leak on Windows Server 2016 #140

safebear opened this issue Dec 16, 2020 · 10 comments

Comments

@safebear
Copy link

I'm running grok exporter on Windows Server 2016 and seeing a memory leak:

image

It's leaking 75MB a day, which is about 1 GB every two weeks.

Here's my config (sanitized):

---
global:
  config_version: 3
input:
  type: file
  paths:
  - ../../../../afolder/*.log
  - ../../../../anotherfolder/*.log
  - '../../../../Program Files/PostgreSQL/11/data/log/postgresql-*.log'
  readall: false
  fail_on_missing_logfile: false
  poll_interval: 10s
grok_patterns:
- removedToSanitizeFile
metrics:
- type: counter
  name: removedToSanitizeFile
  help: removedToSanitizeFile
  match: removedToSanitizeFile
  paths:
  - ../../../../removedToSanitizeFile
  labels:
    error_message: '{{.message}}'
    logfile: '{{base .logfile}}'
- type: counter
  name: removedToSanitizeFile
  help: removedToSanitizeFile
  match: removedToSanitizeFile
  path: ../../../../removedToSanitizeFile
  labels:
    error_message: '{{.message}}'
    logfile: '{{base .logfile}}'
- type: counter
  name: postgres_errors
  help: Number of postgres errors per error type.
  match: '%{POSTGRES_DATE} %{POSTGRES_CODE}:
  %{POSTGRES_USER},%{POSTGRES_DB},%{POSTGRES_APP},%{POSTGRES_CLIENT}
  %{POSTGRES_ERROR_MESSAGE:message}$'
  paths:
  - '../../../../Program Files/PostgreSQL/11/data/log/postgresql-*.log'
  labels:
    error_message: '{{.message}}'
    logfile: '{{base .logfile}}'
server:
  protocol: http
  port: 3903

Please can you let me know if there's anything I can do to investigate this memory leak further and pinpoint the cause of it? Would pprof help...? https://github.com/google/pprof

@fstab
Copy link
Owner

fstab commented Dec 17, 2020

Yes, pprof is the right tool, but you need to add it to the source code and recompile grok_exporter in order to use it. If you do that, could you use the built-in-metrics-improvement branch? I'm currently rewriting the built-in metrics to make debugging memory and slow patterns easier, and it might help in addition to the pprof data to see the new built-in metric values.

If you cannot compile grok_exporter from source, either I can compile grok_exporter with pprof for you, or you might be able to give me an example logfile so I can try to reproduce it myself.

@safebear
Copy link
Author

Thanks, I'd like to do it so I can help fix any future issues. I'm guessing it's only an issue on Windows as it's a handle leak, how do I build the Windows exe from source? There's only details for Mac or Linux on the README.

@fstab
Copy link
Owner

fstab commented Dec 23, 2020

I'm building the Windows release on a Linux machine using ./hack/release.sh windows-amd64. This will run the fstab/grok_exporter-compiler-amd64 Docker image which contains a cross-compiler for creating the Windows executable.

Alternatively, you could try to run the Docker command directly, like this:

docker run -v ~/go/src/github.com/fstab/grok_exporter:/go/src/github.com/fstab/grok_exporter \
    --net none \
    --rm \
    -ti fstab/grok_exporter-compiler-amd64:v1.0.0-SNAPSHOT \
    ./compile-windows-amd64.sh -ldflags -X github.com/fstab/grok_exporter/exporter.Version=1.0.0-SNAPSHOT -X github.com/fstab/grok_exporter/exporter.BuildDate=2020-12-24 -X github.com/fstab/grok_exporter/exporter.Branch=built-in-metrics-improvement -X github.com/fstab/grok_exporter/exporter.Revision=0f46c8f -o dist/grok_exporter-1.0.0-SNAPSHOT.windows-amd64/grok_exporter.exe

@safebear
Copy link
Author

safebear commented Jan 6, 2021

Hi, I can't pull the docker image:

Unable to find image 'fstab/grok_exporter-compiler-amd64:v1.0.0-SNAPSHOT' locally docker: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit.

I've tried for three days now :( Usually this limit is reset after 6 hours. Do you have an automated process that is using it up every day?

@fstab
Copy link
Owner

fstab commented Jan 6, 2021

Oh, sorry, copy-and-paste error. The latest image is fstab/grok_exporter-compiler-amd64:v1.0.0.RC5. The v1.0.0-SNAPSHOT version is wrong.

@safebear
Copy link
Author

safebear commented Jan 7, 2021

Same issue unfortunately:

Unable to find image 'fstab/grok_exporter-compiler-amd64:v1.0.0.RC5' locally docker: Error response from daemon: toomanyrequests: You have reached your pull rate limit. You may increase the limit by authenticating and upgrading: https://www.docker.com/increase-rate-limit.

@fstab
Copy link
Owner

fstab commented Jan 7, 2021

When I saw an error related to v1.0.0-SNAPSHOT I assumed it's the wrong tag causing the problem, but now I see that it's about Docker's new pull rate limit and not about the wrong tag.

I've never hit this limit, it's actually the first time I'm seeing this. My understanding is that this is a limit per client, so if you hit that limit there must be something pulling more than 100 images every 6 hours on your machine, not on my machine. I just removed the image locally and pulled it again with docker pull fstab/grok_exporter-compiler-amd64:v1.0.0.RC5 and it worked without any issues.

Meanwhile I'm working on a long running load test for grok_exporter. If I'm lucky, I find a way to reproduce this in a test setup.

@safebear
Copy link
Author

safebear commented Jan 11, 2021

Hi, here's a couple of heap snapshots (run go tool pprof -http=:8080 heap.out for a cool UI), you can also 'diff' the two. I've also captured a dump of the goroutines goroutine.out: link to wetransfer. Hopefully they'll help. I've got a sneaking suspicion that it could be this problem: https://blog.detectify.com/2019/09/05/how-we-tracked-down-a-memory-leak-in-one-of-our-go-microservices/ But please let me know what you find out. Btw, where does grok_exporter log to on windows?

@safebear
Copy link
Author

safebear commented Jan 12, 2021

Ok, I've taken another look and it seems that the memory leak may not be in your code, it could be in the prometheus client library for go: https://github.com/prometheus/client_golang/releases (according to pprof)

So it could be this issue: prometheus/client_golang#784

You're using version 1.7.1 (pre the above issue) so I'm going to try rebuilding the code with version 1.9.0 and see if the leak goes away.

UPDATE: Nope, no joy.

@fstab
Copy link
Owner

fstab commented Jan 13, 2021

Thanks a lot for your work on this. I took a look at the heap snapshots, but as you said there's nothing pointing to a memory leak there. I'll take a deeper look into this as well and keep you updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants