Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingester search production readiness #932

Closed
10 of 12 tasks
mdisibio opened this issue Sep 2, 2021 · 4 comments · Fixed by #943
Closed
10 of 12 tasks

Ingester search production readiness #932

mdisibio opened this issue Sep 2, 2021 · 4 comments · Fixed by #943
Assignees
Milestone

Comments

@mdisibio
Copy link
Contributor

mdisibio commented Sep 2, 2021

Ingester search was recently merged #806 but it is marked experimental. This issue is tracking the list of additional work to consider it production-ready:

@mdisibio mdisibio added this to the v1.2 milestone Sep 2, 2021
@joe-elliott
Copy link
Member

joe-elliott commented Sep 7, 2021

We are seeing traceheaders use quite a bit of disk space in the ingesters. More than previously estimated. Consider the following tasks as well:

@joe-elliott
Copy link
Member

joe-elliott commented Oct 8, 2021

Two issues we've seen in ops at ~1.5M spans/second and searching continuously:

  • Twice in the past 7 days this has been logged:
    a117bf98-f9d7-4429-8023-54fe82ca18f5 read /var/tempo/wal/search/a117bf98-f9d7-4429-8023-54fe82ca18f5:1:searchdata: file already closed
    This is due to a race condition in searchWal. Determine how to bubble up this error correctly or avoid.
  • Occassionally ingester goroutines spike in the 10s of thousands and search gets very slow. This resolves after a few seconds, but we should research this and attempt to prevent it.
    image

@mdisibio
Copy link
Contributor Author

Two issues we've seen in ops at ~1.5M spans/second and searching continuously:

#1033 definitely addresses the first issue, and likely the 2nd. When a deadlock occurs and the ingest path is stalled, many goroutines are created for queued up ingest traffic which continues to be received.

@mdisibio
Copy link
Contributor Author

  • Occassionally ingester goroutines spike in the 10s of thousands and search gets very slow. This resolves after a few seconds, but we should research this and attempt to prevent it.

This was improved already with #1033 but #1076 further reduces contention between search and ingest traffic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants