Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: deal with orphan traces and expired traces #1408

Merged
merged 19 commits into from
Nov 7, 2024

Conversation

VinozzZ
Copy link
Contributor

@VinozzZ VinozzZ commented Nov 1, 2024

Which problem is this PR solving?

There are two special types of traces based on trace's SendBy value, orphan traces and expired traces.
Orphan trace is a trace's SendBy has passed for more than 4 times of configured trace timeout.
Expired trace is a trace's SendBy has passed for more than 2 times of configured trace timeout.

For an orphan trace, we should just go ahead with decision making since the decider has no knowledge about it. Therefore, under memory pressure, we should get rid of orphan traces as well.

This PR also makes sure decision spans sent for expired traces are not being added into trace cache in the peers and the signal is only sent once per expired trace.

Short description of the changes

  • eject orphan traces in checkAlloc
  • add a new meta.refinery.expired_trace flag in the decision span sent for expired traces
  • add a new Retried flag on the trace object
  • clean up collect loop

@VinozzZ VinozzZ force-pushed the yingrong/debug_collect_loop branch from 13a4a9e to cdb839f Compare November 6, 2024 00:35
MikeGoldsmith pushed a commit that referenced this pull request Nov 7, 2024
## Which problem is this PR solving?

To reduce ingest latency, moving the work for publishing kept trace
decisions into a separate goroutine so it doesn't block ingest incoming
data.

This code has been running in kibble for a few days and it's copied from
the debug branch #1408

## Short description of the changes

- create a buffer for kept trace decision
- make publishing kept trace decision non-blocking in the `collect` loop
@VinozzZ VinozzZ changed the title fix: debug incoming_queue and peer_queue throughput fix: deal with orphan traces and expired traces Nov 7, 2024
@VinozzZ VinozzZ added the type: enhancement New feature or request label Nov 7, 2024
@VinozzZ VinozzZ marked this pull request as ready for review November 7, 2024 16:45
@VinozzZ VinozzZ requested a review from a team as a code owner November 7, 2024 16:45
@VinozzZ VinozzZ added this to the v2.9 milestone Nov 7, 2024
@VinozzZ VinozzZ merged commit de51caf into main Nov 7, 2024
5 checks passed
@VinozzZ VinozzZ deleted the yingrong/debug_collect_loop branch November 7, 2024 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants