-
Notifications
You must be signed in to change notification settings - Fork 808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when ingester starting, ingester increase thousands goroutines #4393
Comments
Can you upload the actual profile? I find those graphviz diagrams unreadable. |
|
@bboreham The -ingester.instance-limits.max-inflight-push-requests option is not a good idea. The goroutine become thousands only when ingester starting. But If the Ingester started, the goroutine only just 100. So just let the ingester started, I need add so many machine to avoid lock competition. this is expensive. The -ingester.instance-limits.max-inflight-push-requests option I also tried, because the timeseries is so many of per request, the max-inflight-push-requests need set so small, so when ingester starting, the max-inflight-push-requests is not useful. I also check the reason, the mutex all locked at "github.com/prometheus/prometheus/tsdb.(*Head).getOrCreateWithID+0x214". The root cause is that:
So , The solution may be: reduce the cost of sort the postings SeriesID |
also this is may be another way. like #4408. if can split many tenant and can query at once. when ingester pushed, there may be many db of tenant. this can decrease the lock competition |
Now it looks like #3349. We had a PR proposed which sharded locks. I think you're saying All of that code is now from Prometheus; it would be necessary to create a benchmark there to demonstrate the issue, and PR to fix it there. By the way, you said "cortex version: release-1.9.0", but this line from the profile extract does not match:
Here is line 1800 for v1.9.0: Version 1.8 is a much better match:
|
I looked at the Prometheus code here: which, if series IDs were in random order, would be very slow. |
@bboreham yes, the pprof profile is release-1.8。I copy the wrong one. yeah, I say seriesID increaseit seem that seriesID is created and increased by prometheus tsbd.Head : https://github.com/prometheus/prometheus/blob/bb05485c79/tsdb/head.go#L1185 And the seriesID is increase by so in prometheus, it seem if seriesID pass to cortexBut cortex only have one db, and the db only have one head, so when v2Push is called by goroutines, lastSeriesID is increased by prometheus Head.getOrCreate, because of the Head. getOrCreateWithID lock, it maybe not insert into the MemPosting. so, When a goroutine fetch the lock, and send to head.MemPostings, it maybe smaller than last seriesId inserted. In my test, when cortex ingester starting, this lock let the seriesId out of order, so it sort the seriesID cost large of time |
Describe the bug
Hi, When a ingester just starting a moment, the goroutines increase thousands. but wait just 1h,the ingester started, the gorouines decrease 100
there is two feature:
when starting:
when started:
To Reproduce
Expected behavior
when starting, why so many goroutines
Storage Engine
thanks
The text was updated successfully, but these errors were encountered: