-
Notifications
You must be signed in to change notification settings - Fork 804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevents panic using freed slice in pool #2000
Prevents panic using freed slice in pool #2000
Conversation
…king Error() calls which reference underlying labels Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
I find it hard to see how this particular change makes the program entirely safe; please file an issue detailing the panics so that we can understand more about what went wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't seen the stack trace yet, but looking at the change history I suspect the regression has been introduced in the PR #1922.
Before the PR #1922, the metric labels were stringified at the error creation time; after the change, a reference to the labels is kept within the error and stringified when the WrappedError()
is call, which occurs after the client.ReuseSlice()
is called. In this scenario, the change introduced in this PR fixes it.
I agree with @bboreham that would be better add a comment to explain it. Would be also interesting to see if we can cover this regression with a unit test: does a test concurrently calling Push()
and run with -race
spot it?
We should also note the change in behaviour here: previously requests were not re-used if they caused a hard error (i.e. an error that is not simply validation). This was discussed in #1863 (which is almost exactly the same PR). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had faced similar issues while testing WAL and thought it was something to do with the WAL and not an existing issue. Had put the same fix in #1103!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm approving this on the basis it did crash and we can fix up the other items later.
See #2004 for suggestion about warnings for future maintainers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…king Error() calls which reference underlying labels (cortexproject#2000) Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
Here is the full panic. This was an Grafana Labs' release from (internal) r70 branch, commit bc6996bf, based on master 1a3b008 + some PRs: #1749, #1878, #1935, #1960. Here is link to the line 317 in ingester.go:
|
Do we know the path through |
Actually I think it does. It's not the |
Hold on. I think my previous message is incorrect. The labels are stored in the This stuff is tricky tho. |
That seems sufficiently difficult to prove I'm going to change the code so we don't have to. |
This defers putting a slice back into the pool (ingester push path) to prevent panicking Error() calls which reference the underlying labels.
This defers returning an allocated slice to its pool until after use. Previously we saw panics on
Error()
calls which held references to the underlying timeseries label set.