Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Feature/ingester stream backpressure #6217

Closed

Conversation

splitice
Copy link
Contributor

@splitice splitice commented May 21, 2022

What this PR does / why we need it:

This PR add backpressure to the response to ingester Query(). A similar aproach could also be used for QuerySample()

This is done through the addition of an Ack(id) method and a limit to the max in-flight responses in the stream (currently 20)

This is built upon my branch for #6216

Which issue(s) this PR fixes:
I suspect would fix #5804

Special notes for your reviewer:

This needs a cleanup and represents a RFC on the concept which it will get if positively reviewed.

Checklist

  • Documentation added
  • Tests updated
  • Is this an important fix or new feature? Add an entry in the CHANGELOG.md.
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/upgrading/_index.md

@splitice splitice requested a review from a team as a code owner May 21, 2022 06:33
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0.4%
+        distributor	0.3%
+            querier	0%
+ querier/queryrange	0%
-               iter	-0.4%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

pkg/ingester/instance.go Outdated Show resolved Hide resolved
@splitice splitice force-pushed the feature/ingester-stream-backpressure branch from fbe7ba5 to 519ba84 Compare May 21, 2022 08:04
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0.4%
-        distributor	-0.3%
+            querier	0%
+ querier/queryrange	0%
-               iter	-0.4%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@splitice splitice force-pushed the feature/ingester-stream-backpressure branch from 471b791 to 220c443 Compare May 21, 2022 08:35
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0.4%
+        distributor	0%
+            querier	0.1%
+ querier/queryrange	0%
-               iter	-0.8%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

1 similar comment
@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0.4%
+        distributor	0%
+            querier	0.1%
+ querier/queryrange	0%
-               iter	-0.8%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

@grafanabot
Copy link
Collaborator

./tools/diff_coverage.sh ../loki-main/test_results.txt test_results.txt ingester,distributor,querier,querier/queryrange,iter,storage,chunkenc,logql,loki

Change in test coverage per package. Green indicates 0 or positive change, red indicates that test coverage for a package fell.

+           ingester	0.4%
+        distributor	0%
+            querier	0.1%
+ querier/queryrange	0%
-               iter	-0.8%
+            storage	0%
+           chunkenc	0%
+              logql	0%
+               loki	0%

Copy link
Contributor

@DylanGuedes DylanGuedes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few (not so important) things. Just for curiosity, what would the backpressure improve? and since you used the term RFC, do you have a doc or anything explaining the reasoning behind the idea?

Comment on lines +579 to +581
if err != nil {
return &logproto.AckResponse{}, err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same error from line 573)

Suggested change
if err != nil {
return &logproto.AckResponse{}, err
}

queryIngester := instance.queries[req.Id]
instance.queryMtx.Unlock()
if queryIngester != nil {
queryIngester.ReleaseAck()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm nitpicking maybe you should call it SendAck() instead of ReleaseAck()? (since SendAck will send an empty struct message to the channel)

pkg/ingester/ingester_query.go Show resolved Hide resolved
@splitice
Copy link
Contributor Author

splitice commented Jun 6, 2022

I dare say between #6241 and this that these are the two main causes of ingester memory peaks. Whether or not this is the direction loki team want to go is worthy of comment.

@stale
Copy link

stale bot commented Jul 10, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Jul 10, 2022
@splitice
Copy link
Contributor Author

Go away

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Jul 10, 2022
@stale
Copy link

stale bot commented Aug 13, 2022

Hi! This issue has been automatically marked as stale because it has not had any
activity in the past 30 days.

We use a stalebot among other tools to help manage the state of issues in this project.
A stalebot can be very useful in closing issues in a number of cases; the most common
is closing issues or PRs where the original reporter has not responded.

Stalebots are also emotionless and cruel and can close issues which are still very relevant.

If this issue is important to you, please add a comment to keep it open. More importantly, please add a thumbs-up to the original issue entry.

We regularly sort for closed issues which have a stale label sorted by thumbs up.

We may also:

  • Mark issues as revivable if we think it's a valid issue but isn't something we are likely
    to prioritize in the future (the issue will still remain closed).
  • Add a keepalive label to silence the stalebot if the issue is very common/popular/important.

We are doing our best to respond, organize, and prioritize all issues but it can be a challenging task,
our sincere apologies if you find yourself at the mercy of the stalebot.

@stale stale bot added the stale A stale issue or PR that will automatically be closed. label Aug 13, 2022
@splitice
Copy link
Contributor Author

Go away

@stale stale bot removed the stale A stale issue or PR that will automatically be closed. label Aug 13, 2022
@DylanGuedes DylanGuedes added the keepalive An issue or PR that will be kept alive and never marked as stale. label Aug 13, 2022
@MasslessParticle
Copy link
Contributor

Hey @splitice, thanks for your contribution!

We've added a new way to get visibility on issues like this. Please check out LIDs. I'm going to close this for now. If you'd still like to discuss it, please open a lid.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive An issue or PR that will be kept alive and never marked as stale. size/XXL
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[bug] ingester goroutine leak
4 participants