Tail-Based Sampling - Scalability Issues #4758

vikrambe · 2019-10-21T10:20:10Z

I like the idea of tail based sampling. I would like to enhance tail based sampling in 2 ways:

Consider a case where we are tail-sampling based on span->http.status_code=500.

I would want to sample 100% of all http.status_code "500"s and 10% of http.status_code "200"s. I would like to create a new sampler that combines string_tag_filter and probabilistic sampler.
What would happen when Spans for a trace land-up in different Otel-Collector instances, one instance can sample the trace when it find http.status_code as 500 and the other could decide not to sample. I was thinking of using group-cache to solve this problem.
Please, share your thoughts on the above.

vikrambe · 2019-10-21T14:29:35Z

@songy23 @inigohu @tigrannajaryan Can you please share a sample config for the below ?

Scale-out by putting one or more collectors behind a load balancer or k8s service, but the load balancer must support traceID-based routing (i.e. all spans for a given traceID need to be received by the same collector instance)

Most of the routing happens on Headers, for this we will have to iterate span batches and route spans based in their traceId's. How does this work when we use Jaeger GRPC ?

Refer : https://github.com/open-telemetry/opentelemetry-collector/blob/master/docs/performance.md

… sampler

vikrambe · 2019-10-27T16:16:40Z

Please review fix for open-telemetry/opentelemetry-collector#1 above.
i.e.

I would want to sample 100% of all http.status_code "500"s and 10% of http.status_code "200"s. I would like to create a new sampler that combines string_tag_filter and probabilistic sampler.

joe-elliott · 2019-11-12T21:07:53Z

A no external dependencies option might look like:

Consistent hash ring communicated via a gossip based protocol. Hashicorp has a go implementation: https://github.com/hashicorp/memberlist
Upon receiving a span a collector hashes the trace id and forwards it to the appropriate collector

I believe this is very roughly what Cassandra does.

During scale up/down traces will be scattered across multiple collectors as the hash ring settles. I suppose these traces would then not be eligible for tail based sampling and the orphaned spans would automatically be flushed to the backend?

Edit:
After continuing to think about it a consistent hash ring is probably overkill and mod N hashing most likely is fine. A scaling event will cause more orphaned spans, but this may be fine as long as its documented. I continue to think a gossip protocol is a good no external dependency way of adding collector discovery.

I'm new to the codebase and still learning. Can this be implemented as a "rebalancing" pipeline processor? It would either forward a span to a different collector or allow it to continue being processed by the current pipeline based on the hash of the trace id? Reviewing the code now with this in mind.

annanay25 · 2019-12-16T10:36:50Z

I'd like to share the design document for scaling tail based sampling for review - https://docs.google.com/document/d/1HEI8E9VFcv_QoA8k4YLoO8Md7XWiE5vqE3pIhKJozTQ/edit?usp=sharing

/cc @open-telemetry/collector-maintainers @joe-elliott

vikrambe · 2019-12-16T10:54:45Z

@annanay25 I thought on the same lines but making otel implement service discovery seems to be an overkill and this design is not scalable at high load. Every otel collector instance starts distributing spans to respective collector instances. It more challenging when we have to unpack grpc batches and redistribute spans. Failover scenarios and tail sampling decision window adds more edge cases and complexity. We need a consistent and a reliable solution that will work at scale here.

vikrambe · 2019-12-16T11:02:04Z

@annanay25 @joe-elliott Maybe use of a in-memory distributed cache should help here. We can run otel collector in two modes . One mode will fill cache Only and other mode will actually make tail sampling decisions based on filter criteria’s. what do you guys think?

vikrambe · 2019-12-16T11:14:09Z

I'd like to share the design document for scaling tail based sampling for review - https://docs.google.com/document/d/1HEI8E9VFcv_QoA8k4YLoO8Md7XWiE5vqE3pIhKJozTQ/edit?usp=sharing

/cc @open-telemetry/collector-maintainers @joe-elliott

@annanay25 Load balancers support ring hash based on header value. OtelCollector when configured with tail sampler can unpack grpc batches and hit the ALBs with span batched by grouping criteria such as TraceID or nodeId etc. And ship spans to respective otel collector instance handling TraceID being processed.

joe-elliott · 2019-12-16T13:19:59Z

@vikrambe One of our goals with the proposed design was to avoid taking on any external dependencies. We recognize that at a certain scale tools like load balancers or in memory caches would definitely make things easier.

Perhaps a broader discussion about this particular feature and what kind of external dependencies the otel-collector would be willing to take on is necessary here?

vikrambe · 2019-12-16T15:19:23Z

@joe-elliott I dont think we would need any external dependencies in otel collector. To keep it simple and yet solve this problem. One of the options that i can think of is to enhance otel collector's agent mode to batch spans and add headers that would help in routing the batchs to appropriate otel collector instance. Routing batches based on headers can be done by fairly any Load balancer.

jpkrohling · 2020-01-10T10:24:04Z

One of our goals with the proposed design was to avoid taking on any external dependencies.

I can totally understand that, but if there's a high-quality embedded distributed caching library out there, it might be worth using it instead of implementing a new one from scratch. @vikrambe's suggestion of golang/groupcache might be worth the investigation.

One advantage I can see when using groupcache instead of the proposed mod N hash is that cache misses would not break the tail-based sampling, as the data will be retrieved from a remote ring member.

william-tran · 2020-05-07T19:57:53Z

If you can use kafka, I think we're pretty close: jaegertracing/jaeger#425 (comment)
Use two tiers of otel-collector with kafka in the middle.

ehmke · 2020-05-28T10:07:15Z

I am finding this discussion very interesting. How does this relate to the comment in the last paragraph here: https://github.com/open-telemetry/opentelemetry-collector/blob/master/docs/performance.md

It says that we would need a LB supporting traceId-based routing to scale horizontally. Does this present a practical option when using a certain combination of protocols? For example, I am sending batches of traces via thrift_http. I assume, it would be enough if the batches would only carry traces with the same trace id and if every http request would have the trace-id header of the traces it is transporting.

secat · 2020-06-22T17:55:24Z

@annanay25 I have read your post on How Grafana Labs enables horizontally scalable tail sampling in the OpenTelemetry Collector.

Are you planning to contribute your new processor called the aggregate processor to the open-telemetry/opentelemetry-collector-contrib project?

morigs · 2020-08-07T12:13:11Z

I think scalable tail-based sampling is a must-have feature and should be inside main repository.
Also IMHO this feature is required for GA.
So should we wait for PR from @annanay25 or implement something similar?

secat · 2020-10-02T13:44:47Z

Is there any update on how to standardize the tail based sampling horizontal scaling?

jpkrohling · 2020-10-02T13:53:33Z

Not yet, we started a discussion as part of open-telemetry/opentelemetry-collector#1724. I do plan on working on a PoC based on my last comments there. The idea would be that the load balancer would direct all spans for specific traces to the same collector, and that collector would be configured to do tail-based sampling.

annanay25 · 2020-10-05T08:26:11Z

Apologies for being unresponsive here. I somehow managed to miss these notifications.

I would be more than happy to contribute the aggregate processor to the contrib repo. However, some concerns were raised in a previous PR where I attempted the same, so I would like to address them as well this time, and work towards something that the community has a consensus on. @jpkrohling has already done some amazing new work in this area, so it would be best to build upon that.

jpkrohling · 2021-01-20T15:43:26Z

Is this still relevant? Given that we have the load balancer in place now, it should be easy to do a scalable tail based solution now.

morigs · 2021-01-20T15:48:57Z

@jpkrohling Could you please describe in more detail the solution with a load balancer?

jpkrohling · 2021-01-20T15:55:16Z

Sure: you'll need two layers of collectors. The first are the load balancer(s), which would be a simple collector with the load balancer exporter configured: https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/master/exporter/loadbalancingexporter

The second would be a cluster of collectors with the tail-sampling processor, or the groupbytrace processor + your own sampling processor, in case you have your own distribution.

If that's still too abstract, I can try to come up with a concrete example, but it might take a while.

vikrambe · 2021-01-20T15:56:37Z

@jpkrohling I have a couple of queries.

How will it handle async calls/long lived transactions ?
Load balancing is one part of this issue. We also need some kind of rule engine to evaluate filter criteria. With AND/OR capabilities to sample the traces. Maybe, do we want to create a separate issue to track it ?

vikrambe · 2021-01-20T15:59:02Z

@jpkrohling Thanks for the loadbalancingexporter. This is helpful...

jpkrohling · 2021-01-20T16:58:36Z

For 1, not sure I understand the problem. Both the tail sampling processor and the groupbytrace will wait for a specific amount of time, under the assumption that all spans for the trace will have arrived by then.

The filter criteria is the second part of the tail sampling processor, known as "policy" there. We started thinking about splitting this into its own processor, but it wouldn't bring anything that doesn't exist already, at least not in the first release.

If none of the existing processors/policies are good for you, you can build your own and pack as part of your custom build of the otelcol.

morigs · 2021-01-21T08:08:22Z

Wow, this is really interesting, didn't know that. I hope this pattern will be well documented in the future.
The only concern is the fact that loadbalancingexporter is in the contrib repository. This sounds like not quite official way.
May be it should be moved to the main collector repo then

jpkrohling · 2021-01-21T09:26:21Z

Where would you expect to see this information? I can create some documentation about it. This belongs to the contrib because it's still understood to be a "niche" feature. The more people using this feature and providing feedback, the higher the chances that this will get moved to the core ;-)

morigs · 2021-01-21T09:57:37Z

I don't think this is possible right now, but I would expect to see the balancing setup documentation somewhere here.
I think load balancing is a must, because scaling is impossible without it.
It would be cool to involve the core team members in this discussion

jpkrohling · 2021-01-21T10:00:57Z

When not doing tail-based sampling, you can do load balancing using the regular techniques. This specialized load balancer is needed only if you need to ensure that the same trace ID is consistently sent to the same backend.

morigs · 2021-01-22T08:53:49Z

I agree. Bu IMHO tail-based sampling is a quite common scenario

heruv1m · 2021-05-26T11:06:26Z

@jpkrohling is there solution for question from issue topic?

I would want to sample 100% of all http.status_code "500"s and 10% of http.status_code "200"s

Is seems, that tail sampling unable to set percent of samples

jpkrohling · 2021-05-26T11:38:04Z

You are right -- policies can't be chained at the moment, though this is something we discussed in the past.

heruv1m · 2021-05-26T12:02:04Z

Can I do that sampling in another way?

jpkrohling · 2021-05-26T12:15:09Z

Not with the current processors. Building one isn't that complicated, though. Here's how the policy parts of the tail-based sampling look like when split from that processor: jpkrohling@fbac498

vikrambe · 2021-05-26T12:34:45Z

@heruv1m Here is the implementation. We can use loadbalancingexporter to co-locate spans belonging to same traceId on a single instance of OpentelemetryCollector instance. Once you co-locate all the spans for a given traceId in a single opentelemetry collector instance you can use the below implementation. It's not available in the master repo as the implementation was turned down, but you can cook your own OpentelemetryCollector binary with below changes.

TestCase : vikrambe/opentelemetry-collector@b3372f2
Implementation : vikrambe/opentelemetry-collector@1ac1887

@heruv1m @jpkrohling I have an implementation of composite policy here open-telemetry/opentelemetry-collector@1a3aed2. Let me know if it helps, i can open a PR on contrib repo...

Design Document For Composite Policy here https://docs.google.com/document/d/10wpIv3TtXgOik05smHm3nYeBX48Bj76TCMxPy8e1NZw/edit#heading=h.ecy5l2puwtp4

heruv1m · 2021-05-26T14:20:35Z

@vikrambe I think your processor is what I need.
PR in contrib repo would be nice

karthikinamanamelluri · 2021-07-05T06:51:52Z

@vikrambe
Could you please refer to the contrib PR here or to the fork if created out of contrib.

vikrambe · 2021-08-20T01:26:32Z

@vikrambe
Could you please refer to the contrib PR here or to the fork if created out of contrib.

#4396 -> Planning to work on the review comments over the weekend.

XiaoWeiKIN · 2021-12-24T03:02:46Z

it's just what I need

vikrambe · 2021-12-24T07:05:38Z

#4396

#4958 -> This is already merged. Use Composite Tailsampling Policy.

jpkrohling · 2022-02-08T12:56:26Z

@vikrambe, do you think there's still something to do here?

vikrambe · 2022-02-08T13:16:52Z

@vikrambe, do you think there's still something to do here?

@jpkrohling I think we have all the pieces here to solve this problem at scale now.

groupByTraceProcessor
loadbalancingexporter
tailSampler with composite policy, With latest changes we can also add AND clause to the policy

We can close this issue. We should put all these pieces together in a document/blog ? Please suggest ?

jpkrohling · 2022-02-08T15:11:02Z

Documentation would be great, perhaps starting as a blog post and migrating later to the official docs. Are you up to writing something?

vikrambe · 2022-02-08T16:47:44Z

Documentation would be great, perhaps starting as a blog post and migrating later to the official docs. Are you up to writing something?

Sure @jpkrohling I would be happy to write a blog on this topic, but i would definitely need your review and suggestions. I will start working on the draft and share it soon.

codeboten · 2022-06-27T17:51:08Z

We can close this issue.

Closing the issue as per the comment above

AkselAllas · 2022-07-06T19:12:27Z

I have a question though.

Is there a way of making loadbalancerexporter collector layer highly available to ensure no traces get lost due to downtime in availability? 🤔

jpkrohling · 2022-07-06T19:28:53Z

You can place a regular load balancer in front of the cluster of trace-ID load-balancer. Just make sure that they have a consistent ring view, otherwise, each node will forward data to a different backing collector. The current DNS resolver can still be used in this scenario, but make sure to set the refresh periodicity to a low number, so that the nodes are out of sync only for brief moments.

vikrambe referenced this issue in vikrambe/opentelemetry-collector Oct 27, 2019

#407:TailSampling: Added sting tag probabilistic filter with counting…

78c0482

… sampler

vikrambe referenced this issue in vikrambe/opentelemetry-collector Oct 27, 2019

#407:Formated files

533de22

vikrambe referenced this issue in vikrambe/opentelemetry-collector Oct 27, 2019

#407:Fixing lint issues

3d953b6

vikrambe referenced this issue in vikrambe/opentelemetry-collector Oct 27, 2019

#407: Changed test assertion package to stretchr

dc37801

vikrambe referenced this issue in vikrambe/opentelemetry-collector Oct 27, 2019

#407: Fixing build issues

1ac1887

vikrambe referenced this issue in vikrambe/opentelemetry-collector Oct 27, 2019

#407: Tests added for string_tag_probabilistic_filter

b3372f2

vikrambe referenced this issue in vikrambe/opentelemetry-collector Oct 28, 2019

#407:Review comments taken care

923862e

bogdandrutu assigned jpkrohling Aug 20, 2021

bogdandrutu transferred this issue from open-telemetry/opentelemetry-collector Aug 20, 2021

alolita added the comp:jaeger Jaeger related issues label Sep 8, 2021

codeboten closed this as completed Jun 27, 2022

Tail-Based Sampling - Scalability Issues #4758

Tail-Based Sampling - Scalability Issues #4758

Comments

vikrambe commented Oct 21, 2019

vikrambe commented Oct 21, 2019

vikrambe commented Oct 27, 2019

joe-elliott commented Nov 12, 2019 • edited Loading

annanay25 commented Dec 16, 2019

vikrambe commented Dec 16, 2019

vikrambe commented Dec 16, 2019

vikrambe commented Dec 16, 2019

joe-elliott commented Dec 16, 2019

vikrambe commented Dec 16, 2019 • edited Loading

jpkrohling commented Jan 10, 2020

william-tran commented May 7, 2020

ehmke commented May 28, 2020

secat commented Jun 22, 2020

morigs commented Aug 7, 2020

secat commented Oct 2, 2020

jpkrohling commented Oct 2, 2020 • edited Loading

annanay25 commented Oct 5, 2020

jpkrohling commented Jan 20, 2021

morigs commented Jan 20, 2021

jpkrohling commented Jan 20, 2021

vikrambe commented Jan 20, 2021

vikrambe commented Jan 20, 2021

jpkrohling commented Jan 20, 2021

morigs commented Jan 21, 2021

jpkrohling commented Jan 21, 2021

morigs commented Jan 21, 2021

jpkrohling commented Jan 21, 2021

morigs commented Jan 22, 2021

heruv1m commented May 26, 2021

jpkrohling commented May 26, 2021

heruv1m commented May 26, 2021

jpkrohling commented May 26, 2021

vikrambe commented May 26, 2021

heruv1m commented May 26, 2021 • edited Loading

karthikinamanamelluri commented Jul 5, 2021 • edited Loading

vikrambe commented Aug 20, 2021

XiaoWeiKIN commented Dec 24, 2021

vikrambe commented Dec 24, 2021

jpkrohling commented Feb 8, 2022

vikrambe commented Feb 8, 2022

jpkrohling commented Feb 8, 2022

vikrambe commented Feb 8, 2022

codeboten commented Jun 27, 2022 • edited Loading

AkselAllas commented Jul 6, 2022

jpkrohling commented Jul 6, 2022

joe-elliott commented Nov 12, 2019 •

edited

Loading

vikrambe commented Dec 16, 2019 •

edited

Loading

jpkrohling commented Oct 2, 2020 •

edited

Loading

heruv1m commented May 26, 2021 •

edited

Loading

karthikinamanamelluri commented Jul 5, 2021 •

edited

Loading

codeboten commented Jun 27, 2022 •

edited

Loading