-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http/pprof: Negative allocation counts produced by /debug/pprof/allocs?seconds=1 #49171
Comments
@prattmic @hyangah this might be of interest to you. I'm still trying to nail the exact root cause of this, but so far I've failed. Maybe something in |
Thanks for filing this issue. I will take a look. You wrote:
Just to confirm, you are saying that the allocation site (PC value) is the same for the two samples, but the symbolization is different? This seems very unexpected.
What are the "p0" and "p1" to which you refer?
Also surprising given the bisection, since that commit really only makes a difference if the inliner is enabled... but I suppose that will make it all the more fun to track down :-) |
Yes. I always also very surprised to see this.
The before/after profile for delta profiling. They are referenced on the line calling
I agree, the whole bug is very strange 😅. But please retest anything stated here if it seems to make no sense. I haven't done the bisection myself, and there is also a chance that some of the information I provided is not correct. But hopefully the provided sample program will offer a good way to reproduce and analyze this on your end. |
Hey @thanm, did you end up investigating this? If yes, did you find any more clues for the root cause? |
I am getting a fix ready for this. I spent a little while wild goose-chasing, but I think I've made some progress. |
This took a while to puzzle out, but basically what's happening here is that we're losing an inline mark as a result of the fix for issue #46234 (CL 320913). Prior to the CL we had this line of code, which was removed as part CL 46234. The |
Change https://golang.org/cl/366494 mentions this issue: |
@thanm thank you so much for debugging and fixing this! Seems to work great based on my testing with the code from above. |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yes, but not previous releases, e.g. go1.16.9.
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
Continuous allocation profiling of a production application using delta profiles similar to running
/debug/pprof/allocs?seconds=60
in a loop.Below is the smallest reproduction of the issue that I've managed to create so far. It executes an APM instrumentation workload and continuously checks the delta allocation profile for negative values.
main.go
Sorry about not being able to make it smaller than this, but so far I wasn't able to trigger the problem with simpler workloads.
What did you expect to see?
alloc_objects
should never be negative. So the example program should exit after 10s printingunable to reproduce issue, shutting down
.What did you see instead?
The example program detects negative
alloc_objects
and prints debugging output like shown below.Additionally
go tool pprof -http=:6060 alloc.pprof
throws JS errors when opening an alloc flamegraph for the profile.A few things to note:
19853706
,19850287
), but the filename/line symbolization for19853706
is different between the samples (option.go:483
vsmain.go:483
).19853706
appears to be correct.profile.Merge()
call that can't match some stack traces with the same addrs fromp0
andp1
because they got symbolized differently (i.e. the origin of this problem is unrelated to delta profiles, but delta profiles are where this is having a very bad impact, so I'm using it as the main example).runtime.MemProfileRate = 1
on top ofmain()
seems to make the test program always pass.-gcflags='-l'
doesn't fix the issue. pprof also reports samples that claim that inlining is still happening, see example below (note how loc17
has two lines). However-m
doesn't output any inlining taking place 🤔Theories
My best guess is that this is somehow related to go1.17 being able to inline functions containing closures.
Most samples with negative alloc values that I've seen involve functions that call functions containing closures. The issue also doesn't seem to be reproducible in go1.16.
However, the issue can still be reproduced in go 1.17 when disabling inlining with
-l
, so I'm not sure ...The text was updated successfully, but these errors were encountered: