Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingester blowing up to tens of thousands of goroutines #4324

Closed
bboreham opened this issue Jun 28, 2021 · 2 comments
Closed

Ingester blowing up to tens of thousands of goroutines #4324

bboreham opened this issue Jun 28, 2021 · 2 comments

Comments

@bboreham
Copy link
Contributor

This is very similar to #858, but I decided to open a new issue as the code involved is different in this case.
We have some metrics from production ~3 months ago that one ingester hit 400,000 goroutines. But little clear data.

To try to reproduce, I deliberately limited ingester's CPU and fired in 500 requests/sec from Avalanche.

Beginning of goroutine dump:

goroutine profile: total 17421
8689 @ 0x43b2c5 0x44cf05 0x44ceee 0x46e5e7 0x47d805 0x47ef70 0x47ef02 0x9d09a7 0x9c6fc7 0x9c6f50 0x2082e45 0x2082de0 0x2082ea6 0x20799dd 0x156b2e9 0x20db263 0xb9ab23 0xd0b2a4 0x20c6cb6 0xb9ab23 0xb9e7e2 0xb9ab23 0xd0eafa 0xb9ab23 0xd0b814 0xb9ab23 0xb9ad17 0x154fdb0 0xb439cb 0xb47b8c 0xb5648b 0x472701
#	0x46e5e6	sync.runtime_SemacquireMutex+0x46							/usr/local/go/src/runtime/sema.go:71
#	0x47d804	sync.(*Mutex).lockSlow+0x104								/usr/local/go/src/sync/mutex.go:138
#	0x47ef6f	sync.(*Mutex).Lock+0x8f									/usr/local/go/src/sync/mutex.go:81
#	0x47ef01	sync.(*RWMutex).Lock+0x21								/usr/local/go/src/sync/rwmutex.go:111
#	0x9d09a6	github.com/prometheus/prometheus/tsdb.(*isolation).newAppendID+0x46			/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/isolation.go:126
#	0x9c6fc6	github.com/prometheus/prometheus/tsdb.(*Head).appender+0x46				/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/head.go:1193
#	0x9c6f4f	github.com/prometheus/prometheus/tsdb.(*Head).Appender+0xaf				/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/head.go:1189
#	0x2082e44	github.com/prometheus/prometheus/tsdb.(*DB).Appender+0x3a4				/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/db.go:797
#	0x2082ddf	github.com/cortexproject/cortex/pkg/ingester.(*userTSDB).Appender+0x33f			/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester_v2.go:149
#	0x2082ea5	github.com/cortexproject/cortex/pkg/ingester.(*Ingester).v2Push+0x405			/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester_v2.go:777
#	0x20799dc	github.com/cortexproject/cortex/pkg/ingester.(*Ingester).Push+0x8dc			/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester.go:475
#	0x156b2e8	github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler.func1+0x88	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2565
#	0x20db262	github.com/cortexproject/cortex/pkg/cortex.ThanosTracerUnaryInterceptor+0xa2		/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/cortex/tracing.go:14
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0b2a3	github.com/weaveworks/common/middleware.ServerUserHeaderInterceptor+0xa3		/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_auth.go:38
#	0x20c6cb5	github.com/cortexproject/cortex/pkg/util/fakeauth.SetupAuthMiddleware.func1+0x115	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/util/fakeauth/fake_auth.go:27
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xb9e7e1	github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1+0x301		/backend-enterprise/vendor/github.com/opentracing-contrib/go-grpc/server.go:57
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0eaf9	github.com/weaveworks/common/middleware.UnaryServerInstrumentInterceptor.func1+0x99	/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_instrumentation.go:32
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0b813	github.com/weaveworks/common/middleware.GRPCServerLog.UnaryServerInterceptor+0x93	/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_logging.go:29
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xb9ad16	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xd6		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
#	0x154fdaf	github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler+0x14f	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2567
#	0xb439ca	google.golang.org/grpc.(*Server).processUnaryRPC+0x52a					/backend-enterprise/vendor/google.golang.org/grpc/server.go:1210
#	0xb47b8b	google.golang.org/grpc.(*Server).handleStream+0xd0b					/backend-enterprise/vendor/google.golang.org/grpc/server.go:1533
#	0xb5648a	google.golang.org/grpc.(*Server).serveStreams.func1.2+0xaa				/backend-enterprise/vendor/google.golang.org/grpc/server.go:871

6866 @ 0x43b2c5 0x44cf05 0x44ceee 0x46e5e7 0x47d805 0x47ef70 0x47ef02 0x9d0c5e 0x9c9254 0x9b3775 0x20844a7 0x20799dd 0x156b2e9 0x20db263 0xb9ab23 0xd0b2a4 0x20c6cb6 0xb9ab23 0xb9e7e2 0xb9ab23 0xd0eafa 0xb9ab23 0xd0b814 0xb9ab23 0xb9ad17 0x154fdb0 0xb439cb 0xb47b8c 0xb5648b 0x472701
#	0x46e5e6	sync.runtime_SemacquireMutex+0x46							/usr/local/go/src/runtime/sema.go:71
#	0x47d804	sync.(*Mutex).lockSlow+0x104								/usr/local/go/src/sync/mutex.go:138
#	0x47ef6f	sync.(*Mutex).Lock+0x8f									/usr/local/go/src/sync/mutex.go:81
#	0x47ef01	sync.(*RWMutex).Lock+0x21								/usr/local/go/src/sync/rwmutex.go:111
#	0x9d0c5d	github.com/prometheus/prometheus/tsdb.(*isolation).closeAppend+0x3d			/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/isolation.go:152
#	0x9c9253	github.com/prometheus/prometheus/tsdb.(*headAppender).Commit+0x633			/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/head.go:1521
#	0x9b3774	github.com/prometheus/prometheus/tsdb.dbAppender.Commit+0x34				/backend-enterprise/vendor/github.com/prometheus/prometheus/tsdb/db.go:817
#	0x20844a6	github.com/cortexproject/cortex/pkg/ingester.(*Ingester).v2Push+0x1a06			/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester_v2.go:896
#	0x20799dc	github.com/cortexproject/cortex/pkg/ingester.(*Ingester).Push+0x8dc			/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/ingester.go:475
#	0x156b2e8	github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler.func1+0x88	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2565
#	0x20db262	github.com/cortexproject/cortex/pkg/cortex.ThanosTracerUnaryInterceptor+0xa2		/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/cortex/tracing.go:14
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0b2a3	github.com/weaveworks/common/middleware.ServerUserHeaderInterceptor+0xa3		/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_auth.go:38
#	0x20c6cb5	github.com/cortexproject/cortex/pkg/util/fakeauth.SetupAuthMiddleware.func1+0x115	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/util/fakeauth/fake_auth.go:27
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xb9e7e1	github.com/opentracing-contrib/go-grpc.OpenTracingServerInterceptor.func1+0x301		/backend-enterprise/vendor/github.com/opentracing-contrib/go-grpc/server.go:57
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0eaf9	github.com/weaveworks/common/middleware.UnaryServerInstrumentInterceptor.func1+0x99	/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_instrumentation.go:32
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xd0b813	github.com/weaveworks/common/middleware.GRPCServerLog.UnaryServerInterceptor+0x93	/backend-enterprise/vendor/github.com/weaveworks/common/middleware/grpc_logging.go:29
#	0xb9ab22	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1.1.1+0x62		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:25
#	0xb9ad16	github.com/grpc-ecosystem/go-grpc-middleware.ChainUnaryServer.func1+0xd6		/backend-enterprise/vendor/github.com/grpc-ecosystem/go-grpc-middleware/chain.go:34
#	0x154fdaf	github.com/cortexproject/cortex/pkg/ingester/client._Ingester_Push_Handler+0x14f	/backend-enterprise/vendor/github.com/cortexproject/cortex/pkg/ingester/client/ingester.pb.go:2567
#	0xb439ca	google.golang.org/grpc.(*Server).processUnaryRPC+0x52a					/backend-enterprise/vendor/google.golang.org/grpc/server.go:1210
#	0xb47b8b	google.golang.org/grpc.(*Server).handleStream+0xd0b					/backend-enterprise/vendor/google.golang.org/grpc/server.go:1533
#	0xb5648a	google.golang.org/grpc.(*Server).serveStreams.func1.2+0xaa				/backend-enterprise/vendor/google.golang.org/grpc/server.go:871

So, when ingester can't keep up, many goroutines can be blocked on access to TSDB.

Just as in #858, it seems to me that once the number of goroutines goes beyond some limit we would be better off failing immediately than trying to carry on with the request.

@pracucci
Copy link
Contributor

Just as in #858, it seems to me that once the number of goroutines goes beyond some limit we would be better off failing immediately than trying to carry on with the request.

In #3992 the support for an hard limit on the max number of inflight push requests.

@bboreham
Copy link
Contributor Author

Thanks. IMO that should be defaulted to something like 10,000, so everyone gets the protection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants