Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Try to drain incoming and peer queues for an amount of time #1440

Merged
merged 2 commits into from
Nov 19, 2024

Conversation

MikeGoldsmith
Copy link
Contributor

@MikeGoldsmith MikeGoldsmith commented Nov 19, 2024

Which problem is this PR solving?

We're seeing that incoming and peer queues can sometimes build up and it takes a long time to recover. A theory is that this is because the collect loop has so many options, we should allow multiple messages to be read from the incoming and peer queues whenever that it's selected to be worked on.

Short description of the changes

  • Add a new utility func that can read messages from a given span channel for up to 100 milliseconds, exiting early if the channel is empty
  • Update incoming and peer channels to use new func

@MikeGoldsmith MikeGoldsmith self-assigned this Nov 19, 2024
@MikeGoldsmith MikeGoldsmith requested a review from a team as a code owner November 19, 2024 17:15
@MikeGoldsmith MikeGoldsmith added this to the v2.9 milestone Nov 19, 2024
@MikeGoldsmith MikeGoldsmith marked this pull request as draft November 19, 2024 17:16

// let't try to process as many spans as we can in the next 100ms
// TODO: make timer configurable?
timer := time.NewTimer(time.Millisecond * 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now each collect loop iteration takes between 10ms to 20ms. Setting the timer to 100ms means we probably are going to process 5-10 items off of the queue each time. Is that enough?
I wonder if we should drain the queue based on the number of items process instead of a timer to make sure we are always draining X amount of items for each iteration.

I don't know which strategy is better. We can try the timer based first to see how it goes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the timer approach is worth trying. If we want it to be more flexible, we could add a temporary config option while we're testing it out.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, maybe we can add a metric here to see how many items we drain from the queue each time?

Copy link
Contributor

@VinozzZ VinozzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to give this a try in production!

@MikeGoldsmith MikeGoldsmith marked this pull request as ready for review November 19, 2024 18:24
Copy link
Contributor

@kentquirk kentquirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks interesting. Let's try it!

@VinozzZ VinozzZ merged commit fc9f0eb into main Nov 19, 2024
5 checks passed
@VinozzZ VinozzZ deleted the mike/greedy-queues branch November 19, 2024 19:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants