-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tail-Based Sampling - Scalability Issues #4758
Comments
@songy23 @inigohu @tigrannajaryan Can you please share a sample config for the below ?
Most of the routing happens on Headers, for this we will have to iterate span batches and route spans based in their traceId's. How does this work when we use Jaeger GRPC ? Refer : https://github.com/open-telemetry/opentelemetry-collector/blob/master/docs/performance.md |
Please review fix for open-telemetry/opentelemetry-collector#1 above.
|
A no external dependencies option might look like:
I believe this is very roughly what Cassandra does. During scale up/down traces will be scattered across multiple collectors as the hash ring settles. I suppose these traces would then not be eligible for tail based sampling and the orphaned spans would automatically be flushed to the backend? Edit: I'm new to the codebase and still learning. Can this be implemented as a "rebalancing" pipeline processor? It would either forward a span to a different collector or allow it to continue being processed by the current pipeline based on the hash of the trace id? Reviewing the code now with this in mind. |
I'd like to share the design document for scaling tail based sampling for review - https://docs.google.com/document/d/1HEI8E9VFcv_QoA8k4YLoO8Md7XWiE5vqE3pIhKJozTQ/edit?usp=sharing /cc @open-telemetry/collector-maintainers @joe-elliott |
@annanay25 I thought on the same lines but making otel implement service discovery seems to be an overkill and this design is not scalable at high load. Every otel collector instance starts distributing spans to respective collector instances. It more challenging when we have to unpack grpc batches and redistribute spans. Failover scenarios and tail sampling decision window adds more edge cases and complexity. We need a consistent and a reliable solution that will work at scale here. |
@annanay25 @joe-elliott Maybe use of a in-memory distributed cache should help here. We can run otel collector in two modes . One mode will fill cache Only and other mode will actually make tail sampling decisions based on filter criteria’s. what do you guys think? |
@annanay25 Load balancers support ring hash based on header value. OtelCollector when configured with tail sampler can unpack grpc batches and hit the ALBs with span batched by grouping criteria such as TraceID or nodeId etc. And ship spans to respective otel collector instance handling TraceID being processed. |
@vikrambe One of our goals with the proposed design was to avoid taking on any external dependencies. We recognize that at a certain scale tools like load balancers or in memory caches would definitely make things easier. Perhaps a broader discussion about this particular feature and what kind of external dependencies the otel-collector would be willing to take on is necessary here? |
@joe-elliott I dont think we would need any external dependencies in otel collector. To keep it simple and yet solve this problem. One of the options that i can think of is to enhance otel collector's agent mode to batch spans and add headers that would help in routing the batchs to appropriate otel collector instance. Routing batches based on headers can be done by fairly any Load balancer. |
I can totally understand that, but if there's a high-quality embedded distributed caching library out there, it might be worth using it instead of implementing a new one from scratch. @vikrambe's suggestion of golang/groupcache might be worth the investigation. One advantage I can see when using groupcache instead of the proposed mod N hash is that cache misses would not break the tail-based sampling, as the data will be retrieved from a remote ring member. |
If you can use kafka, I think we're pretty close: jaegertracing/jaeger#425 (comment) |
I am finding this discussion very interesting. How does this relate to the comment in the last paragraph here: https://github.com/open-telemetry/opentelemetry-collector/blob/master/docs/performance.md It says that we would need a LB supporting traceId-based routing to scale horizontally. Does this present a practical option when using a certain combination of protocols? For example, I am sending batches of traces via thrift_http. I assume, it would be enough if the batches would only carry traces with the same trace id and if every http request would have the trace-id header of the traces it is transporting. |
@annanay25 I have read your post on How Grafana Labs enables horizontally scalable tail sampling in the OpenTelemetry Collector. Are you planning to contribute your new processor called the aggregate processor to the open-telemetry/opentelemetry-collector-contrib project? |
I think scalable tail-based sampling is a must-have feature and should be inside main repository. |
Is there any update on how to standardize the tail based sampling horizontal scaling? |
Not yet, we started a discussion as part of open-telemetry/opentelemetry-collector#1724. I do plan on working on a PoC based on my last comments there. The idea would be that the load balancer would direct all spans for specific traces to the same collector, and that collector would be configured to do tail-based sampling. |
Apologies for being unresponsive here. I somehow managed to miss these notifications. I would be more than happy to contribute the aggregate processor to the contrib repo. However, some concerns were raised in a previous PR where I attempted the same, so I would like to address them as well this time, and work towards something that the community has a consensus on. @jpkrohling has already done some amazing new work in this area, so it would be best to build upon that. |
Is this still relevant? Given that we have the load balancer in place now, it should be easy to do a scalable tail based solution now. |
@jpkrohling Could you please describe in more detail the solution with a load balancer? |
Sure: you'll need two layers of collectors. The first are the load balancer(s), which would be a simple collector with the load balancer exporter configured: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/master/exporter/loadbalancingexporter The second would be a cluster of collectors with the tail-sampling processor, or the groupbytrace processor + your own sampling processor, in case you have your own distribution. If that's still too abstract, I can try to come up with a concrete example, but it might take a while. |
@jpkrohling I have a couple of queries.
|
@jpkrohling Thanks for the loadbalancingexporter. This is helpful... |
For 1, not sure I understand the problem. Both the tail sampling processor and the groupbytrace will wait for a specific amount of time, under the assumption that all spans for the trace will have arrived by then. The filter criteria is the second part of the tail sampling processor, known as "policy" there. We started thinking about splitting this into its own processor, but it wouldn't bring anything that doesn't exist already, at least not in the first release. If none of the existing processors/policies are good for you, you can build your own and pack as part of your custom build of the otelcol. |
Wow, this is really interesting, didn't know that. I hope this pattern will be well documented in the future. |
Where would you expect to see this information? I can create some documentation about it. This belongs to the contrib because it's still understood to be a "niche" feature. The more people using this feature and providing feedback, the higher the chances that this will get moved to the core ;-) |
I don't think this is possible right now, but I would expect to see the balancing setup documentation somewhere here. |
When not doing tail-based sampling, you can do load balancing using the regular techniques. This specialized load balancer is needed only if you need to ensure that the same trace ID is consistently sent to the same backend. |
I agree. Bu IMHO tail-based sampling is a quite common scenario |
@jpkrohling is there solution for question from issue topic?
Is seems, that tail sampling unable to set percent of samples |
You are right -- policies can't be chained at the moment, though this is something we discussed in the past. |
Can I do that sampling in another way? |
Not with the current processors. Building one isn't that complicated, though. Here's how the policy parts of the tail-based sampling look like when split from that processor: jpkrohling@fbac498 |
@heruv1m Here is the implementation. We can use loadbalancingexporter to co-locate spans belonging to same traceId on a single instance of OpentelemetryCollector instance. Once you co-locate all the spans for a given traceId in a single opentelemetry collector instance you can use the below implementation. It's not available in the master repo as the implementation was turned down, but you can cook your own OpentelemetryCollector binary with below changes. TestCase : vikrambe/opentelemetry-collector@b3372f2 @heruv1m @jpkrohling I have an implementation of composite policy here open-telemetry/opentelemetry-collector@1a3aed2. Let me know if it helps, i can open a PR on contrib repo... Design Document For Composite Policy here https://docs.google.com/document/d/10wpIv3TtXgOik05smHm3nYeBX48Bj76TCMxPy8e1NZw/edit#heading=h.ecy5l2puwtp4 |
@vikrambe I think your processor is what I need. |
@vikrambe |
it's just what I need |
@vikrambe, do you think there's still something to do here? |
@jpkrohling I think we have all the pieces here to solve this problem at scale now.
We can close this issue. We should put all these pieces together in a document/blog ? Please suggest ? |
Documentation would be great, perhaps starting as a blog post and migrating later to the official docs. Are you up to writing something? |
Sure @jpkrohling I would be happy to write a blog on this topic, but i would definitely need your review and suggestions. I will start working on the draft and share it soon. |
Closing the issue as per the comment above |
I have a question though. Is there a way of making loadbalancerexporter collector layer highly available to ensure no traces get lost due to downtime in availability? 🤔 |
You can place a regular load balancer in front of the cluster of trace-ID load-balancer. Just make sure that they have a consistent ring view, otherwise, each node will forward data to a different backing collector. The current DNS resolver can still be used in this scenario, but make sure to set the refresh periodicity to a low number, so that the nodes are out of sync only for brief moments. |
I like the idea of tail based sampling. I would like to enhance tail based sampling in 2 ways:
Consider a case where we are tail-sampling based on span->http.status_code=500.
Please, share your thoughts on the above.
The text was updated successfully, but these errors were encountered: