Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How sampler.type=remote works #832

Closed
xihw opened this issue May 18, 2018 · 31 comments
Closed

How sampler.type=remote works #832

xihw opened this issue May 18, 2018 · 31 comments

Comments

@xihw
Copy link

xihw commented May 18, 2018

" Remote (sampler.type=remote, which is also the default) sampler consults Jaeger agent for the appropriate sampling strategy to use in the current service. This allows controlling the sampling strategies in the services from a central configuration in Jaeger backend, or even dynamically (see Adaptive Sampling). "

This is excerpted from Jaeger Doc and it looks pretty confusing to me. Can you help me to understand it with the following questions ?

  1. "consults Jaeger agent for the appropriate sampling strategy" -- As I know there are two places to configure sampling rate: jaeger-client and jaeger-collector. What role does Jaeger agent play here?

  2. "This allows controlling the sampling strategies in the services from a central configuration in Jaeger backend" -- Does "a central configuration in Jaeger backend" mean jaeger-collector ?

  3. What if we use zipkin-client + jaeger backend (jaeger-collector + jaeger-ui + storage) ? In this case we don't have jaeger-agent running, how does the "remote consulting" work ?

  4. A follow-up question on 3. Without jaeger-agent, how is batch handled ? According to zipkin's doc: https://zipkin.io/pages/architecture.html, I interpret "Transport" as "jaeger-agent", in zipkin-client + jaeger backend scenario, do we discard "Transport" ?

@yurishkuro
Copy link
Member

yurishkuro commented May 18, 2018

  1. agent proxies the requests to the collector, so that the client does not need to know where collectors are located (agent is usually on the localhost)
  2. yes, configuration (and soon adaptive calculations) come from the collectors, but clients receive them via agent
  3. remotely controlled samplers are only supported by Jaeger clients, not Zipkin clients.
  4. Not sure which "batch" you are referring to.

@xihw
Copy link
Author

xihw commented May 18, 2018

  1. Okay, based on the two configurations mentioned by:
    https://www.jaegertracing.io/docs/sampling/#ClientSamplingConfiguration
    https://www.jaegertracing.io/docs/sampling/#CollectorSamplingConfiguration
    and considering following scenario, can you answer how many percent of spans reported from service to agent ? How many percent sent from agent to collector ? And how many percent saved to storage by collector ?

a. ClientSamplingConfiguration says probabilistic 0.1, and CollectorSamplingConfiguration says probabilistic 0.2

b. ClientSamplingConfiguration says remote, and CollectorSamplingConfiguration says probabilistic 0.2

  1. "batch" I mean send spans to collector in batches to avoid heavy traffic issue. That's my understanding to description here: https://www.jaegertracing.io/docs/architecture/ (search 'batch'). With Zipkin + jaeger backend, I don't think we have a mechanism to do the batch and that's my concern.

@black-adder
Copy link
Contributor

  1. Any span that is reported by the service will be persisted, ie the decision is made once. In your example, the ClientSamplingConfiguration will be used instead of the CollectorSamplingConfiguration so the sampling probability will be 0.1. If you instead were to use sampler.type=remote in the ClientSamplingConfiguration, then the client will use the CollectorSamplingConfiguration of 0.2. (client MUST be configured with sampler.type=remote in order for it to receive sampling rates from the collector, or else it will use the sampling rate provided by the service owner)

@black-adder
Copy link
Contributor

  1. The jaeger clients are designed to always batch spans before sending them. If no jaeger-agent is present, the golang and java jaeger clients can be configured to send batch spans over http.

@xihw
Copy link
Author

xihw commented May 18, 2018

Any span that is reported by the service will be persisted

You mean persisted into storage?

In your example, the ClientSamplingConfiguration will be used instead of the CollectorSamplingConfiguration so the sampling probability will be 0.1

Sorry still confused when is the 0.1 used ? service -> agent or agent -> collector or collector -> storage ? or all of them (if all of them then finally 0.1 * 0.1 * 0.1 will be stored in DB right) ?

@black-adder

@black-adder
Copy link
Contributor

The sampling rate is only used at the service, 0.1 of traces will be stored in the DB.

@xihw
Copy link
Author

xihw commented May 18, 2018

Ok! so the sampling only happens in service before sending spans out.
One more question:

configuration (and soon adaptive calculations) come from the collectors, but clients receive them via agent

What is the flow ? From service's standpoint, is it pull / push ?
And when does it happen? Once when service is up or periodically ?

@black-adder
Copy link
Contributor

services pulls from agent every minute, this is configurable: https://github.com/jaegertracing/jaeger-client-go/blob/master/config/config.go#L86

We haven't done this yet but I've always wanted to do push. It's on my personal road map.

@xihw
Copy link
Author

xihw commented May 18, 2018

Can you also help me understand another sampling propagation question --
Will a service generate a span for incoming request before deciding sample or not sample it ?
Will unsampled span propagated between services ?

@black-adder
Copy link
Contributor

Sampling and generation of a span happens roughly at the same time. Context is always propagated between services (even if unsampled).

@xihw
Copy link
Author

xihw commented May 19, 2018

If service B receives a request with context saying something like {"span_a", "unsampled"}, B will still create a span as child of "span_a" and propagate continuously , but won't report it, is it correct ?

@black-adder
Copy link
Contributor

yes

@xihw
Copy link
Author

xihw commented May 19, 2018

Ok so does it mean that it's possible for every request being traced by putting it's span info into the log even though we do sampling ?
If so do you have any resource showing how to do that ?

@black-adder
Copy link
Contributor

I'm not sure I understand the question. Are you asking if span logs are always persisted even if we do sampling?

@xihw
Copy link
Author

xihw commented May 19, 2018

I'm asking is it possible to use some logging framework like MDC (http://www.baeldung.com/mdc-in-log4j-2-logback) to log the trace id for every single request even if we do sampling.

@black-adder
Copy link
Contributor

yes you can log the trace id for every request but since you're sampling, some logs will have trace ids without a persisted trace.

@black-adder
Copy link
Contributor

This is a golang example: https://github.com/jaegertracing/jaeger/blob/master/examples/hotrod/pkg/log/spanlogger.go

however, here we're doing more than just logging the traceid, we're dual logging to both the log reporter and into the span.

@black-adder
Copy link
Contributor

closing issue, feel free to open if you have more questions

@ecourreges-orange
Copy link

1. agent proxies the requests to the collector, so that the client does not need to know where collectors are located (agent is usually on the localhost)

The agent proxies the config request to the collector through which connection? The TChannel or gRPC, whichever one is connected?
These docs don't really explain where the sampling config is sent through:
https://www.jaegertracing.io/docs/1.14/getting-started/#all-in-one
https://www.jaegertracing.io/docs/1.14/deployment/#collectors
Also it would be a nice improvement to know which are encryptable or encrypted by default,
This page does not detail which protocol between which component has encryption support:
#458

Thank you.

@yurishkuro
Copy link
Member

The agent proxies the config request to the collector through which connection? The TChannel or gRPC, whichever one is connected?

whichever one you configure on the agent. We recommend gRPC.

Also it would be a nice improvement to know which are encryptable or encrypted by default,

See #1718

These docs don't really explain where the sampling config is sent through:

Can you elaborate what can be improved in the docs? If you're using remote sampler, then the sampling configuration is defined in the collectors, and is pulled by the clients periodically client<-agent<-collector

@ecourreges-orange
Copy link

Thanks, now it's all coming together through different info from the different refered github issues.

Can you elaborate what can be improved in the docs? If you're using remote sampler, then the sampling configuration is defined in the collectors, and is pulled by the clients periodically client<-agent<-collector

For improvements to the doc, here are ideas:

  • add something like a column or an additionnal comment about TLS/encryption support on the port/protocol/function tables of the getting started and deployment (all-in-one and collector)
  • add in these tables the fact that jaeger-agent doesn't just send spans through gRPC but also pulls config.
  • add a different architecture schema here:
    https://www.jaegertracing.io/docs/1.14/architecture/
    With more of a network aspect of showing the actual protocol+ports on the arrows, where it currently shows in dotted red the "control flow" which is not a network connection.
    Or split it per component if it becomes a graphic nightmare.

Thanks.

@vamshi67
Copy link

Does 'remote' sampling work with http-sender? In my aks cluster setup, I haven't configured 'jaeger-agent'.

@yurishkuro
Copy link
Member

Sampler has nothing to do with Sender, it's an independent component. It can work with both the agent and the collector.

@vamshi67
Copy link

Thanks Yuri for the quick response. I really appreciate your help on this.

I'm using Jaeger K8s operators and has following sampling strategy in the configmap:
_

apiVersion: v1
data:
sampling: '{"default_strategy":{"operation_strategies":[{"operation":"/health","param":0,"type":"probabilistic"},{"operation":"/metrics","param":0,"type":"probabilistic"}],"param":0.1,"type":"probabilistic"}}'
kind: ConfigMap
metadata:
creationTimestamp: "2020-06-15T23:42:14Z"
labels:
app: jaeger
app.kubernetes.io/component: sampling-configuration
app.kubernetes.io/instance: jaeger
app.kubernetes.io/managed-by: jaeger-operator
app.kubernetes.io/name: jaeger-sampling-configuration
app.kubernetes.io/part-of: jaeger
name: jaeger-sampling-configuration
namespace: monitoring

*** We're using monitoring namespace instead of observability.
_

Client application has following properties:

**

sampler.type=const
sampler.sampling-rate=1

**

Since these properties are defined in the application's properties file, I'm overriding using k8s environment variables. I have set sampler.type to remote. As I don't know what value should be given to sampling-rate when sampler.type is set to remote, I set it as 1

With this when I created the pod, every sample is being collected. I'm not sure why it is not honoring remote configuration.

Am I missing anything?

@yurishkuro
Copy link
Member

The numeric value of 1 is treated as 100% default probability when the sampler cannot contact the backend. It's possible that in your deployment it cannot reach the backend and never gets the 0.1 probability. The sampler emits metrics about unsuccessful configuration pulls.

@tcoln
Copy link

tcoln commented Aug 20, 2020

1. agent proxies the requests to the collector, so that the client does not need to know where collectors are located (agent is usually on the localhost)

2. yes, configuration (and soon adaptive calculations) come from the collectors, but clients receive them via agent

3. remotely controlled samplers are only supported by Jaeger clients, not Zipkin clients.

4. Not sure which "batch" you are referring to.

Dear yuri,
I have a question, if remotely contorlled samplers are only suporter via agent, and agent pulls config via gRPC+protobuff. Then what is the sampling.thrift for?

@yurishkuro
Copy link
Member

Previously agent was using Thrift to retrieve sampling from collector. Not it uses protobuf, but the clients consume sampling as JSON, and that JSON is still generated from Thrift.

@tcoln
Copy link

tcoln commented Aug 23, 2020

Previously agent was using Thrift to retrieve sampling from collector. Not it uses protobuf, but the clients consume sampling as JSON, and that JSON is still generated from Thrift.

You mean the sample strategies are sent to agents from collector via thrift previously but via protobuff+gRPC now ? I know client get sampling json using http+5778 port. So I care about how collector sent them to agent.

@yurishkuro
Copy link
Member

collector to agent is grpc

@tcoln
Copy link

tcoln commented Aug 24, 2020

collector to agent is grpc

Thanks, yuri.

@sharninder
Copy link

The numeric value of 1 is treated as 100% default probability when the sampler cannot contact the backend. It's possible that in your deployment it cannot reach the backend and never gets the 0.1 probability. The sampler emits metrics about unsuccessful configuration pulls.

I'm not sure this is completely correct. Or there is a bug in this code path. I'm setting sampler type to remote and leaving the param yet, the param value is being set to 1 by default even when the remote actually has a param of 0.5. Seems like a bug to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants