Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update example config for handling tail-sampling and span metric generation when horizontally scaling collectors #6260

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

swar8080
Copy link
Contributor

@swar8080 swar8080 commented Feb 9, 2025

Hello, we used this page's documentation when setting-up tail-sampling and span metric generation. The guide was helpful but there were a couple things it could've mentioned to make our implementation smoother:

  1. The guide currently suggests having two separate deployments of collectors. One just for load-balancing spans and another for processing the load-balanced spans. We started with the two deployments but then found it easier to maintain a single deployment of collectors responsible for both tasks. However, it wasn't obvious (to us at least) that this was possible and I'm guessing others would prefer this approach too.
  2. There wasn't any example configuration of how the load-balancing exporter, span metric connector, and tail sampling processor all fit together, so it took some jumping between READMEs and unintentional violations of single-writer assumption (which we weren't aware of) to get it right.

So this documentation suggestion shows how all the components work together in a single collector deployment.

@swar8080 swar8080 requested a review from a team as a code owner February 9, 2025 19:10
@swar8080 swar8080 force-pushed the scaling-stateful-collectors-detailed-example branch from dd7bf67 to eb5c3ab Compare February 9, 2025 19:10
@opentelemetrybot opentelemetrybot requested review from a team and bogdandrutu and removed request for a team February 9, 2025 19:11
…ing exporter, tail-sampling processor, and span metrics connector together when scaled to multiple collector instances.

Also removes language and configuration suggesting that load-balancing should be a different deployment of collectors than the collectors doing the tail sampling and span metric generation. It's easier to maintain a single deployment responsible for both load balancing and processing of the load balanced data but the pattern for doing this may not be obvious at first.
@swar8080 swar8080 force-pushed the scaling-stateful-collectors-detailed-example branch from eb5c3ab to b184128 Compare February 9, 2025 19:14
@opentelemetrybot opentelemetrybot requested a review from a team February 9, 2025 19:14
@svrnm
Copy link
Member

svrnm commented Feb 10, 2025

thanks @swar8080 ! @open-telemetry/collector-approvers PTAL

Copy link
Member

@jpkrohling jpkrohling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the change, but I'd really prefer to have three collector instances (three config files) instead of one with three pipelines. The reason is that each pipeline (load-balancer, tail-sampler, span metrics) has a different load profile and would scale differently.

You absolutely CAN do it like depicted here, but without understanding the nuances, I'd prefer the official documentation to have each pipeline to be its own deployment.


traces/span_metrics:
receivers:
- otlp/for_tail_sampling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- otlp/for_tail_sampling
- otlp/for_span_metrics

@swar8080
Copy link
Contributor Author

Hi @jpkrohling, thanks for reviewing

Separating span metrics and tail sampling to different collectors is a good callout. A spike it span volume or span metric cardinality explosion would likely only cause problems for one deployment but not the other. I'll go ahead and change this into three collector configurations, or two if the below persuades you otherwise :).

Maybe we missed the benefits of load balancing being its own deployment. Both set-ups have to receive incoming spans, and load balancing to the same deployment has twice as many receives and exports. But it seems like the extra memory for load balancing is dwarfed by memory needed for tail-sampling and span metrics, which also grows with span volume. For CPU, load balancing exporter used a lot before this optimization, but now our pprof shows it as a small percent of total cpu time. So for us it didn't seem worth the effort of maintaining another deployment, which is why we ended-up consolidating it. That saved us another deployment to monitor, and also one less file to jump between when working on our collector config, since we did some filtering/edits before load balancing. So maybe two separate deployments, each load balancing to themselves, would be a good set-up for a lot of users?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants