Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevents panic using freed slice in pool #2000

Merged

Conversation

owen-d
Copy link
Contributor

@owen-d owen-d commented Jan 18, 2020

This defers putting a slice back into the pool (ingester push path) to prevent panicking Error() calls which reference the underlying labels.

This defers returning an allocated slice to its pool until after use. Previously we saw panics on Error() calls which held references to the underlying timeseries label set.

…king Error() calls which reference underlying labels

Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
@owen-d owen-d changed the title Prevents error using freed slice in pool Prevents panic using freed slice in pool Jan 18, 2020
@bboreham
Copy link
Contributor

I find it hard to see how this particular change makes the program entirely safe; please file an issue detailing the panics so that we can understand more about what went wrong.
Consider also adding comments to help future maintainers avoid the path to disaster.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen the stack trace yet, but looking at the change history I suspect the regression has been introduced in the PR #1922.

Before the PR #1922, the metric labels were stringified at the error creation time; after the change, a reference to the labels is kept within the error and stringified when the WrappedError() is call, which occurs after the client.ReuseSlice() is called. In this scenario, the change introduced in this PR fixes it.

I agree with @bboreham that would be better add a comment to explain it. Would be also interesting to see if we can cover this regression with a unit test: does a test concurrently calling Push() and run with -race spot it?

@bboreham
Copy link
Contributor

We should also note the change in behaviour here: previously requests were not re-used if they caused a hard error (i.e. an error that is not simply validation). This was discussed in #1863 (which is almost exactly the same PR).

Copy link
Contributor

@codesome codesome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had faced similar issues while testing WAL and thought it was something to do with the WAL and not an existing issue. Had put the same fix in #1103!

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm approving this on the basis it did crash and we can fix up the other items later.

See #2004 for suggestion about warnings for future maintainers.

Copy link
Contributor

@pracucci pracucci left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pracucci pracucci merged commit add279f into cortexproject:master Jan 20, 2020
pracucci pushed a commit to grafana/cortex that referenced this pull request Jan 20, 2020
…king Error() calls which reference underlying labels (cortexproject#2000)

Signed-off-by: Owen Diehl <ow.diehl@gmail.com>
@pstibrany
Copy link
Contributor

pstibrany commented Jan 20, 2020

Here is the full panic. This was an Grafana Labs' release from (internal) r70 branch, commit bc6996bf, based on master 1a3b008 + some PRs: #1749, #1878, #1935, #1960.

Here is link to the line 317 in ingester.go:
https://github.com/grafana/cortex/blob/bc6996bf3277bf9700115f90172f8691b4a37816/pkg/ingester/ingester.go#L317

panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x460740]
goroutine 1019278284 [running]:
bytes.(*Buffer).WriteString(0xc1fddb3100, 0x0, 0x4, 0x0, 0x0, 0x0)
    /usr/local/go/src/bytes/buffer.go:186 +0xc8
github.com/prometheus/prometheus/pkg/labels.Labels.String(0xc1823fef00, 0xa, 0x28, 0xc0b71b6300, 0x0)
    /go/pkg/mod/github.com/prometheus/prometheus@v1.8.2-0.20191126064551-80ba03c67da1/pkg/labels/labels.go:58 +0xe8
github.com/cortexproject/cortex/pkg/ingester.(*validationError).Error(0xc19c8edb30, 0x1, 0x1)
    /go/src/github.com/cortexproject/cortex/pkg/ingester/errors.go:65 +0x97
github.com/cortexproject/cortex/pkg/ingester.(*validationError).WrappedError(0xc19c8edb30, 0xc216ef0ca0, 0x314d015)
    /go/src/github.com/cortexproject/cortex/pkg/ingester/errors.go:72 +0x2f
github.com/cortexproject/cortex/pkg/ingester.(*Ingester).Push(0xc000117400, 0x3805540, 0xc241c97830, 0xc23bef7740, 0xc000117400, 0xc241c97830, 0xc216ef0c00)
    /go/src/github.com/cortexproject/cortex/pkg/ingester/ingester.go:317 +0x464
github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler.func1(0x3805540, 0xc241c97830, 0x3000f80, 0xc23bef7740, 0x3805540, 0xc241c97830, 0x0, 0x0)
    /go/src/github.com/cortexproject/cortex/pkg/ingester/client/cortex.pb.go:2923 +0x86
github.com/weaveworks/common/middleware.ServerUserHeaderInterceptor(0x3805540, 0xc241c977d0, 0x3000f80, 0xc23bef7740, 0xc23bef7780, 0xc23bef77a0, 0x2e978e0, 0xc241c977d0, 0x2bed160, 0x5172260)
    /go/pkg/mod/github.com/weaveworks/common@v0.0.0-20190822150010-afb9996716e4/middleware/grpc_auth.go:38 +0x9e
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0x3805540, 0xc241c977d0, 0x3000f80, 0xc23bef7740, 0x30e9500, 0xc21539e100, 0x3805540, 0xc241c977d0)
    /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/chain.go:25 +0x63
github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1(0x3805540, 0xc241c97770, 0x3000f80, 0xc23bef7740, 0xc23bef7780, 0xc23bef7800, 0x0, 0x0, 0x0, 0x0)
    /go/pkg/mod/github.com/opentracing-contrib/go-grpc@v0.0.0-20180928155321-4b5a12d3ff02/server.go:57 +0x2f9
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0x3805540, 0xc241c97770, 0x3000f80, 0xc23bef7740, 0x117, 0x129, 0xc062868800, 0xc1e5cc17a0)
    /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/chain.go:25 +0x63
github.com/weaveworks/common/middleware.UnaryServerInstrumentInterceptor.func1(0x3805540, 0xc241c97770, 0x3000f80, 0xc23bef7740, 0xc23bef7780, 0xc23bef7820, 0x4c62c6, 0x5e2237bb, 0x5c066c1, 0x119ad3c286ef6f)
    /go/pkg/mod/github.com/weaveworks/common@v0.0.0-20190822150010-afb9996716e4/middleware/grpc_instrumentation.go:17 +0xa3
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0x3805540, 0xc241c97770, 0x3000f80, 0xc23bef7740, 0x4144cf, 0x7f0e16efd168, 0x20308e, 0x100)
    /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/chain.go:25 +0x63
github.com/weaveworks/common/middleware.GRPCServerLog.UnaryServerInterceptor(0x384dc60, 0xc0007aa9e0, 0x0, 0x3805540, 0xc241c97770, 0x3000f80, 0xc23bef7740, 0xc23bef7780, 0xc23bef7860, 0xc0b71b6300, ...)
    /go/pkg/mod/github.com/weaveworks/common@v0.0.0-20190822150010-afb9996716e4/middleware/grpc_logging.go:28 +0x98
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1(0x3805540, 0xc241c97770, 0x3000f80, 0xc23bef7740, 0xc21539e000, 0x0, 0xc1a1fd4a10, 0x40ca48)
    /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/chain.go:25 +0x63
github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1(0x3805540, 0xc241c97770, 0x3000f80, 0xc23bef7740, 0xc23bef7780, 0xc23bef77a0, 0xc1a1fd4a80, 0x62f50d, 0x2e978e0, 0xc241c97770)
    /go/pkg/mod/github.com/grpc-ecosystem/go-grpc-middleware@v1.1.0/chain.go:34 +0xd5
github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler(0x30f4ae0, 0xc000117400, 0x3805540, 0xc241c97770, 0xc214e98b40, 0xc00094e360, 0x3805540, 0xc241c97770, 0xc211153900, 0x1294)
    /go/src/github.com/cortexproject/cortex/pkg/ingester/client/cortex.pb.go:2925 +0x14b
google.golang.org/grpc.(*Server).processUnaryRPC(0xc000474780, 0x38469e0, 0xc0b575e480, 0xc21539e000, 0xc001676ea0, 0x512cdc0, 0x0, 0x0, 0x0)
    /go/pkg/mod/google.golang.org/grpc@v1.25.1/server.go:1007 +0x460
google.golang.org/grpc.(*Server).handleStream(0xc000474780, 0x38469e0, 0xc0b575e480, 0xc21539e000, 0x0)
    /go/pkg/mod/google.golang.org/grpc@v1.25.1/server.go:1287 +0xd97
google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc0e2587fb0, 0xc000474780, 0x38469e0, 0xc0b575e480, 0xc21539e000)
    /go/pkg/mod/google.golang.org/grpc@v1.25.1/server.go:722 +0xbb
created by google.golang.org/grpc.(*Server).serveStreams.func1
    /go/pkg/mod/google.golang.org/grpc@v1.25.1/server.go:720 +0xa1

@bboreham
Copy link
Contributor

Do we know the path through wrapWithUser(err, userID) doesn't retain a reference?

@pracucci
Copy link
Contributor

Do we know the path through wrapWithUser(err, userID) doesn't retain a reference?

Actually I think it does. It's not the wrapWithUser() itself, but the fact that err still survives once the function exits:
https://github.com/cortexproject/cortex/pull/1960/files#diff-01dca72d6e6e6e585153f2fbef6ae2dfL311-L312

@pracucci
Copy link
Contributor

Hold on. I think my previous message is incorrect. The labels are stored in the validationError, but if such an error occurs we consider it a partial error and continue. The wrapWithUser(err, userID) is hit for any other error type, which will not hold the labels (because we store them only in validationError.

This stuff is tricky tho.

@bboreham
Copy link
Contributor

because we store them only in validationError

That seems sufficiently difficult to prove I'm going to change the code so we don't have to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants