Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: redistribute remaining traces during shutdown #1261

Merged
merged 23 commits into from
Aug 9, 2024

Conversation

VinozzZ
Copy link
Contributor

@VinozzZ VinozzZ commented Jul 31, 2024

Which problem is this PR solving?

When a node leaves the cluster, gossip the news and then calculate the new destination and forward traces that node was holding on to. This means that removing a node from the refinery cluster will be much less disruptive than it is today.

Short description of the changes

  • During shutdown, the collector first to stop accepting new traces by existing from the collect() function.
  • Traces are distributed to three go channels based on their state
    - late spans
    - traces that have already received root span or it has expired from its TraceTimeout
    - traces that can't have their decisions made yet
  • A goroutine is created to grab items off of the three go channels. For the late spans, expired traces, or traces with root span, collector will send them to honeycomb. For traces that don't have decisions yet, collector will forward them to their new home(another refinery peer)

I also added unit tests for the distribution logic as well as a counter to monitor the amount of traces that goes through each state.

address #1204

@VinozzZ VinozzZ added this to the v2.8 milestone Jul 31, 2024
@VinozzZ VinozzZ added the type: enhancement New feature or request label Jul 31, 2024
@VinozzZ VinozzZ force-pushed the yingrong.drain_traces_on_shutdown_part2 branch from 8fd7247 to b192cf8 Compare August 1, 2024 23:35
@VinozzZ VinozzZ force-pushed the yingrong.drain_traces_on_shutdown_part2 branch from e2eed00 to addbe9e Compare August 1, 2024 23:48
@VinozzZ VinozzZ marked this pull request as ready for review August 2, 2024 00:04
@VinozzZ VinozzZ requested a review from a team as a code owner August 2, 2024 00:04
Copy link
Contributor

@kentquirk kentquirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see some minor changes, please, but this is good stuff.

app/app_test.go Show resolved Hide resolved
collect/cache/cuckooSentCache.go Outdated Show resolved Hide resolved
collect/collect.go Show resolved Hide resolved
collect/collect.go Outdated Show resolved Hide resolved
collect/collect.go Outdated Show resolved Hide resolved
collect/collect.go Outdated Show resolved Hide resolved
@VinozzZ VinozzZ requested a review from kentquirk August 6, 2024 19:58
Copy link
Contributor

@kentquirk kentquirk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question about telemetry, but I think we've got this one looking correct.

collect/collect.go Show resolved Hide resolved
@VinozzZ VinozzZ merged commit 8bbdda7 into main Aug 9, 2024
5 checks passed
@VinozzZ VinozzZ deleted the yingrong.drain_traces_on_shutdown_part2 branch August 9, 2024 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants