Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow overriding telemetry providers #4970

Open
mx-psi opened this issue Mar 8, 2022 · 26 comments
Open

Allow overriding telemetry providers #4970

mx-psi opened this issue Mar 8, 2022 · 26 comments
Labels
area:service collector-telemetry healthchecker and other telemetry collection issues priority:p2 Medium

Comments

@mx-psi
Copy link
Member

mx-psi commented Mar 8, 2022

Is your feature request related to a problem? Please describe.

When embedding the OpenTelemetry Collector on an existing application, said application may have an existing telemetry system for logging, metrics and/or tracing. With the current API, while some features of the telemetry may be configurable through the configuration provider (e.g. one can enable/disable the metrics telemetry), the telemetry can't be easily integrated into existing systems.

Currently, the logging features can be overriden since one can pass LoggingOptions,

// LoggingOptions provides a way to change behavior of zap logging.
LoggingOptions []zap.Option
which effectively allows one to change the zap.Core and have arbitrary logging, but the metrics and traces providers cannot be modified apart from the config exposed in the YAML.

Describe the solution you'd like

A new TelemetryFactory can be passed to CollectorSettings. This provider would have a type similar to:

type TelemetryFactory func(*config.ServiceTelemetry) telemetry.TelemetrySettings

A default TelemetryFactory is provided for distributions that want to keep the current behavior or build upon the current behavior. This factory can take options such as zap.Logger options.

The LoggingOptions field is eventually removed, since it's superseded by the default factory.

Describe alternatives you've considered

CollectorSettings could instead have new MetricsOptions and TracingOptions of some kind, although this seems more restrictive.

Additional context

I think this will allow creating the telemetry providers at the same point in the initialization as today, but care should be taken to ensure that this can be done as soon as possible.

The factory for telemetry settings should be extendable in the future for new types of telemetry that OpenTelemetry may add. This is effectively the same problem as the one we have with component factories, and we can apply the same solution here.

@pmm-sumo
Copy link
Contributor

I think having more flexible options here could be also helpful when implementing own telemetry reporting capability of OpAMP (or any other use-case where collector's own telemetry is sent out)

@bogdandrutu
Copy link
Member

@pmm-sumo if otel-go library is not capable of supporting that (via an extra exporter for example) then we failed to implement the library sdk. I don't think for your use-case this issue is relevant.

@bogdandrutu
Copy link
Member

@mx-psi I am failing to see what the proposed solution looks like. Let's start by first define what "capabilities"/"mean" to run the collector as library.

When embedding the OpenTelemetry Collector on an existing application, said application may have an existing telemetry system for logging, metrics and/or tracing.

Do you want to use the same "instance of otel-go sdk" between the collector library and the other parts of the binary? If yes, why? Do we need to support this?

We can easily build factories of factories and settings of settings, but let's define what we try to achieve. Your point about not changing global state like the grpc logger I get it, but interaction between collector library and the rest of the application is confusing.

I also would be curios for example, the watching for config changes? How that interacts with your rest application? The restart logic, etc.

What I am trying to say is that I am not sure what the contract should be, and where we draw the line here.

@bogdandrutu
Copy link
Member

One idea that I discussed with @mx-psi, also related to #4605 is for them to manually create their own "fixed" pipeline, so they don't need to depend on the Service which brings things like "ways to configure pipelines" or "config changes monitoring", etc.

@mx-psi
Copy link
Member Author

mx-psi commented Mar 18, 2022

Apologies for the delay in replying!

One idea that I discussed with @mx-psi, also related to #4605 is for them to manually create their own "fixed" pipeline, so they don't need to depend on the Service which brings things like "ways to configure pipelines" or "config changes monitoring", etc.

IMO, for this to be comfortable to use, we would need a function/struct/some public API that allows one to construct a fixed pipeline and handles the plumbing of telemetry, component.BuildInfo, component.Host...

I agree that we don't need service.Collector for this use case (we don't need the config providers complexity, and we don't want the modification of global state), but still we probably need something like service.Service to be exposed. If there is no opposition, I will look at what needs to be done to expose service.Service and create a different issue to discuss it.

I think this discussion (fixed pipelines defined in Go code) may also be of interest to @Aneurysm9 if a bridge between Collector components and the Go SDK is a use case we want to eventually support. It may even be interesting to explicitly define what use cases we want to support, to drive design decisions in the future.

@dmitryax dmitryax added the priority:p2 Medium label Mar 18, 2022
@bogdandrutu
Copy link
Member

@mx-psi I would first look into a prototype of using the "components" and manually connect them in a pipeline without exposing the "Service". Let's try a prototype/experiment.

@mx-psi
Copy link
Member Author

mx-psi commented Mar 28, 2022

Opened a PoC on #5107, PTAL!

@bogdandrutu
Copy link
Member

I am a bit confused... I was looking also into the version where in your case you don't use anything in the service at all.

@mx-psi
Copy link
Member Author

mx-psi commented Mar 29, 2022

Oh, okay, completely misread your comment, sorry. Alright, I will try this out too this week :)

@Aneurysm9
Copy link
Member

The telemetry provider PoC looks nice, but I also was thinking that bogdan was looking for building a pipeline without a service. That is the capability that would be most interesting to us in the Go SDK to enable re-use of the collector's exporters.

@mx-psi
Copy link
Member Author

mx-psi commented Mar 29, 2022

Yeah, sorry, as I said, I read Bogdan's message completely wrong 😅 I will try out that too at some point this week

@sveiss
Copy link

sveiss commented Mar 30, 2022

We also have a use case for this feature:

We run an internal distribution of the collector to add some internally-built pipeline components. We do use the service package from the collector (our top-level code is similar to otelcontribcol in opentelemetry-collector-contrib).

We instrument our components with tracing, and we'd like to export these spans from the collector. It would also be nice to get the collector's internal spans, so our resulting traces aren't missing parents.

(We actually had this working back when the collector-internal tracing used OpenCensus, although I doubt it was an intended use case.)

@mx-psi's POC was enough for me to get this working again in our distribution wrapper without any further modifications to the collector itself.

@bogdandrutu
Copy link
Member

@sveiss I understand the problem and that the current proposal for @mx-psi solves that. But there is a conceptual disagreement in my mind, which is what is the "code part" that is responsible to create these providers. Right now we give users chance to configure them via the service Configuration, so it seems very unnatural to pass a "factory provider" that will ignore the user configuration. I think if you want to use our collector/service then your custom code should be one of the components of the collector (receiver/processor/exporter/extension) and if that is the case and you use the "provider" pass to the Create func than tracing will work.

@mx-psi
Copy link
Member Author

mx-psi commented Apr 4, 2022

it seems very unnatural to pass a "factory provider" that will ignore the user configuration

To be clear, a telemetry provider in #5107 can ignore user configuration, but it doesn't have to: the relevant method in the interface is SetupTelemetry(cfg config.ServiceTelemetry) (component.TelemetrySettings, error) precisely so that configuration can be taken into account. Also, note that we already provide a way to customize telemetry via the Logging Options field, which can effectively ignore user configuration for logs (in fact, we kind of already do that on Windows).

That said, I think your point in general is valid (in my use case for example I would either ignore configuration or pass a fixed config always). I am still working on the 'pipelines without service' PoC, but I haven't quite figured out how to handle configuration, since there does not seem to be a very comfortable way to merge with the default config (best idea I have so far is do config.Config -> config.Map -> config.Config via Unmarshal, which is not quite satisfying).

@bogdandrutu
Copy link
Member

I think for configuration you do:

factory := otlpexporter.NewFactory()
otlpCfg := factory.NewDefaultConfiguration().(*otlpexporter.Config)
otlpCfg.HttpClientConfig = ...
factory.NewExporter(..., otlpCfg, ...)

You never go to/from the yaml map.

@bogdandrutu
Copy link
Member

@mx-psi also a bit not sure if you should simply ignore the concept of default. And simply construct the component config. My motivation is because these defaults are probably too connected to the way we configure the collector as a service.

@Aneurysm9
Copy link
Member

@mx-psi also a bit not sure if you should simply ignore the concept of default. And simply construct the component config. My motivation is because these defaults are probably too connected to the way we configure the collector as a service.

That seems like a dangerous assumption.

You never go to/from the yaml map.

I think not being able to use the YAML configuration would make re-use of collector exporters by the Go SDK overly complicated and too tightly coupled. It might work, but it would not be a good user experience.

@bogdandrutu
Copy link
Member

@Aneurysm9 Go SDK does not have a YAML config.. why do you need that?

@Aneurysm9
Copy link
Member

  1. we would like to have a YAML config for the Go SDK even if we don't have one now.
  2. not having a YAML config for the Go SDK does not mean that we could not have one for a collector exporter bridge.

@sveiss
Copy link

sveiss commented Apr 4, 2022

@sveiss I understand the problem and that the current proposal for @mx-psi solves that. But there is a conceptual disagreement in my mind, which is what is the "code part" that is responsible to create these providers.

I added a comment because I think my use case is a little different to the one described by @mx-psi in the original issue.

I'm trying to run a collector with extras (custom processors/extensions/internal configuration library). We're building our own top-level otelcol command only because we don't want to fork the collector repo, so we're trying to keep the wrapper as thin as possible. We want to delegate as much as we can to the service package.

Using the collector as a library in an existing application is different, and I can see why you'd want to avoid the collector service infrastructure in this case.

I'm hoping the solution we end up with allows us to keep leveraging as much of the collector service code as possible.

I think if you want to use our collector/service then your custom code should be one of the components of the collector (receiver/processor/exporter/extension) and if that is the case and you use the "provider" pass to the Create func than tracing will work.

Our custom code is already structured as collector components, and we can get spans exported from them easily. The PoC in #5107 lets us set up a TracerProvider and use it to export both our own spans and the collector's through the same pipeline. We couldn't do that previously, because the collector creates its own TracerProvider in func (*Collector) Run.

@mx-psi
Copy link
Member Author

mx-psi commented Apr 5, 2022

I opened #5149 so we can discuss a concrete implementation instead of theoretical concerns. I explained in #5149 (comment) the problems I see with configuration.

I am still not entirely sure if @bogdandrutu or @Aneurysm9 were thinking about something entirely different from this, if so, please provide feedback and we can try and iterate to something closer to what you had in mind.

I would also suggest putting this issue on hold and splitting off the 'pipeline without service' discussion to a separate issue, to avoid further polluting the discussion here, if you think it makes sense (even if my solution looks more like #5149 in the end, it may be that @sveiss' still needs a telemetry provider).

@jaronoff97
Copy link
Contributor

+1 am interested in this! @mx-psi any chance you're still interested in working on this?

@mx-psi
Copy link
Member Author

mx-psi commented Jul 20, 2023

I opened #8111 as draft to move this forward. PTAL at the open questions there!

@mx-psi
Copy link
Member Author

mx-psi commented Jan 26, 2024

@jaronoff97 @jmacd @bogdandrutu @ptodev

I am making some (slow) progress with this, first by fixing some issues in the telemetry initialization (see #9384). My end goal would be that we have a factory for telemetry, just like we have for other components. This needs a big refactor, but what vendors would interact with is:

// CreateSettings holds configuration for building Telemetry.
type CreateSettings struct {
	BuildInfo         component.BuildInfo
	AsyncErrorChannel chan error
	ZapOptions        []zap.Option // maybe we can get rid of this
}

// Config for telemetry.
type Config any

// Factory is factory interface for telemetry.
// This interface cannot be directly implemented. Implementations must
// use the NewFactory to implement it.
type Factory interface {
	CreateDefaultConfig() Config
	CreateLogger(ctx context.Context, set Settings, cfg Config) (*zap.Logger, error)
	CreateTracerProvider(ctx context.Context, set Settings, cfg Config) (trace.TracerProvider, error)
	CreateMeterProvider(ctx context.Context, set Settings, cfg Config) (metric.MeterProvider, error)
	CreateResource(ctx context.Context, set Settings, cfg Config) (*resource.Resource, error)
 
	// More methods could be added here in the future if necessary 
	// Either a noop implementation would be provided if not available
	// Or we would just error out and handle the noop-ness at the service initialization
	unexportedFactoryFunc()
}

// FactoryOption apply changes to Factory.
type FactoryOption interface { /* sealed interface with no public methods */ }

type CreateLoggerFunc func(context.Context, Settings, Config) (*zap.Logger, error)
func WithLogger(c CreateLoggerFunc) FactoryOption {  /* ... */ }

type CreateMeterProviderFunc func(context.Context, Settings, Config) (meter.MeterProvider, error)
func WithMeterProvider(c CreateMeterProviderFunc) FactoryOption {  /* ... */ }

type CreateTracerProviderFunc func(context.Context, Settings, Config) (trace.TracerProvider, error)
func WithTracerProvider(c CreateTracerProviderFunc) FactoryOption {  /* ... */ }

type CreateResourceFunc func(context.Context, Settings, Config) (*resource.Resource, error)
func WithResource(c CreateResourceFunc) FactoryOption {  /* ... */ }

func NewFactory(createDefaultConfig component.CreateDefaultConfigFunc, options ...FactoryOption) Factory {  /* ... */ }

We could make this work through the builder. I think this addresses some of the concerns that have been brought up in the past:

  • You can have arbitrary configuration for the service::telemetry section
  • You don't ignore the user configuration
  • No need to expose more internals from the Collector like the graph package

Since this is a big chunk of work, I want to first know if:

  1. (For vendors and end-users interested in this) Does this address your use case?
  2. (For people who have reviewed the many iterations of this) Does this address your concerns re: the public API we would be exposing?

@jmacd
Copy link
Contributor

jmacd commented Jan 26, 2024

This addresses my concerns! As a vendor, we are running custom OTel collectors inside our SaaS and have exactly the stated reasons for wanting more control over telemetry settings used by those collectors, so that they fit in with our overall organization's telemetry schema (and since we have existing Golang code to do so, already targeting the OTel-Go instrumentation APIs). Currently we have hacked this support together using an extension and by explicitly overriding TelemetrySettings in specific components.

Your proposals would give us a way to customize the telemetry produced in the collector completely, not just for individual components. This would also let us use and experiment with alternative OTel-Go SDKs inside an OpenTelemetry Collector, too.

mx-psi added a commit that referenced this issue Feb 13, 2024
**Description:** 

Refactor meter provider initialization; removes `telemetryInitializer`
and replaces it by a MeterProvider wrapper that owns the lifetime of the
OpenCensus registry and the servers associated with the MeterProvider.

**Link to tracking Issue:** Relates to #4970 (first refactor before
trying out the factory pattern in an internal package)
codeboten added a commit that referenced this issue Apr 16, 2024
**Description:** 

Moves logging messages about the meter provider outside of the meter
provider initialization.

While working on
#4970 (comment),
I realized there is an implicit dependency in the initialization order
of the different telemetry components.

I tried making this work and make the factory have a single
`CreateTelemetrySettings`, but this results in an awkward API, so I am
trying the alternative here: don't log during the meter provider
initialization, but do so outside of it.

**Link to tracking Issue:** Relates to #4970

---------

Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
mx-psi added a commit that referenced this issue May 21, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

Switches `service/telemetry` to a factory pattern. To avoid adding a lot
of public API in one go:
1. the actual factory builder is in an internal package
2. I have not added the `CreateMeterProvider` method yet

There are two goals with this: one is to make progress on #4970, the
other is to allow initializing telemetry sooner:

<!-- Issue number if applicable -->
#### Link to tracking issue
Updates #4970. 

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Updates existing tests to use `NewFactory`
andrzej-stencel pushed a commit to andrzej-stencel/opentelemetry-collector that referenced this issue May 27, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

Switches `service/telemetry` to a factory pattern. To avoid adding a lot
of public API in one go:
1. the actual factory builder is in an internal package
2. I have not added the `CreateMeterProvider` method yet

There are two goals with this: one is to make progress on open-telemetry#4970, the
other is to allow initializing telemetry sooner:

<!-- Issue number if applicable -->
#### Link to tracking issue
Updates open-telemetry#4970. 

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Updates existing tests to use `NewFactory`
steves-canva pushed a commit to Canva/opentelemetry-collector that referenced this issue Jun 14, 2024
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
#### Description

Switches `service/telemetry` to a factory pattern. To avoid adding a lot
of public API in one go:
1. the actual factory builder is in an internal package
2. I have not added the `CreateMeterProvider` method yet

There are two goals with this: one is to make progress on open-telemetry#4970, the
other is to allow initializing telemetry sooner:

<!-- Issue number if applicable -->
#### Link to tracking issue
Updates open-telemetry#4970. 

<!--Describe what testing was performed and which tests were added.-->
#### Testing

Updates existing tests to use `NewFactory`
@mx-psi
Copy link
Member Author

mx-psi commented Jul 31, 2024

Removed this from the Collector v1 board based on #10643

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:service collector-telemetry healthchecker and other telemetry collection issues priority:p2 Medium
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants