Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AddSource with predicate #5353

Closed
wants to merge 7 commits into from

Conversation

lmolkova
Copy link

@lmolkova lmolkova commented Feb 13, 2024

Fixes dotnet/aspnetcore#4371

Changes

Adds ability to add sources using predicate for ActivitySource:

  • this allows better control over sources (e.g. Azure.* enables all Azure SDK sources, but what if I want to listen to all but Azure.Verbose.Stuff)
  • allows to specify desired sources without using (undocumented?) wildcards or adding all of them one-by-one
  • enablement based on version (or other activity source properties such as scope attributes)
  • also partially mitigates issues behind Allow suppressing nested activities without dependency on OTel #4642

Context: open-telemetry/opentelemetry-specification#3867
It's a common need for libraries to emit multiple sources and let users decide which ones they want (or all of them).

Merge requirement checklist

  • CONTRIBUTING guidelines followed (license requirements, nullable enabled, static analysis, etc.)
  • Unit tests added/updated
  • Appropriate CHANGELOG.md files updated for non-trivial changes
  • Changes in public API reviewed (if applicable)

@reyang
Copy link
Member

reyang commented Feb 14, 2024

  • this allows better control over sources (e.g. Azure.* enables all Azure SDK sources, but what if I want to listen to all but Azure.Verbose.Stuff)
  • allows to specify desired sources without using (undocumented?) wildcards or adding all of them one-by-one

Would you provide a concrete use case? For example, if the goal is to avoid "Azure.Verbose.Stuff", does "Azure.Error.Stuff" + "Azure.Info.Stuff" + "Azure.Xyz.Stuff" (should be a manageable list) work?

I think any user provided callback on the hot path is something dangerous and we should put a high bar #5349 (comment).

@lmolkova
Copy link
Author

lmolkova commented Feb 14, 2024

The specific use-case is Cosmos DB:

  • it creates activities for public API calls on Azure.CosmosDB.Operations source
  • it will create transport-level activities for underlying transport calls with Azure.CosmosDB.Requests source
  • The default experience would be that Azure.CosmosDB.Operations is enabled by default, but some users may want to enable transport-level calls in dev/test environment, etc.

There expected to be a few sources per each Azure SDK (already is), documented beforehand that look like Azure.ServiceBus.SenderClient, Azure.ServiceBus.ProcessorClient, etc.

More use cases and spec-level discussion here open-telemetry/opentelemetry-specification#3867
As a reminder, this callback is nothing more than more flexible AddSource("Azure.*") one.

This callback is not on the hot path - it's executed when source is created (e.g. once per source during process or client instance lifetime) and I don't see the dangers you're talking about in this case.

@reyang
Copy link
Member

reyang commented Feb 14, 2024

The specific use-case is Cosmos DB:

  • it creates activities for public API calls on Azure.CosmosDB.Operations source
  • it will create transport-level activities for underlying transport calls with Azure.CosmosDB.Requests source
  • The default experience would be that Azure.CosmosDB.Operations is enabled by default, but some users may want to enable transport-level calls in dev/test environment, etc.

There expected to be a few sources per each Azure SDK (already is), documented beforehand that look like Azure.ServiceBus.SenderClient, Azure.ServiceBus.ProcessorClient, etc.

Could these be categorized using the names? Down the path would there be indefinite number of special cases, and how would the user manage it (e.g. keeping add special cases in the predicate?)

@lmolkova
Copy link
Author

lmolkova commented Feb 14, 2024

Could these be categorized using the names? Down the path would there be indefinite number of special cases, and how would the user manage it (e.g. keeping add special cases in the predicate?)

they are categorized using names.
But as a user, if I don't use predicate, I would write an endless list to enable them

  • Azure.Storage.Blobs.*
  • Azure.Storage.Queues.*
  • Azure.Storage.DataLake.*
  • Azure.SeviceBus.Process.*
    ....

With <Root>.* if you want to exclude one source, you have to enable all but that one, i.e. list all that you want to enable.
With predicate, if you want to exclude one source, you add one excluded source to the predicate.

There could be better solutions:

So yes, predicate does not solve all the problems, but it's a simple evolution of wildcard enablement and solves the problem of disabling noisy source or picking better defaults than startsWith. It does not block any door for other solution.

@lmolkova lmolkova force-pushed the add-source-predicate branch from c493a0d to 2e04ba1 Compare February 22, 2024 02:18
Copy link

codecov bot commented Feb 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 84.86%. Comparing base (6250307) to head (91c5de1).
Report is 188 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5353      +/-   ##
==========================================
+ Coverage   83.38%   84.86%   +1.48%     
==========================================
  Files         297      282      -15     
  Lines       12531    12257     -274     
==========================================
- Hits        10449    10402      -47     
+ Misses       2082     1855     -227     
Flag Coverage Δ
unittests ?
unittests-Solution-Experimental 84.98% <100.00%> (?)
unittests-Solution-Stable 84.83% <100.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
.../OpenTelemetry.Api/Metrics/MeterProviderBuilder.cs 100.00% <ø> (ø)
...c/OpenTelemetry.Api/Trace/TracerProviderBuilder.cs 100.00% <ø> (ø)
...nTelemetry/Internal/OpenTelemetrySdkEventSource.cs 81.73% <100.00%> (-0.17%) ⬇️
.../Metrics/Builder/MeterProviderBuilderExtensions.cs 98.52% <100.00%> (-1.48%) ⬇️
...lemetry/Metrics/Builder/MeterProviderBuilderSdk.cs 98.46% <100.00%> (+0.10%) ⬆️
src/OpenTelemetry/Metrics/MeterProviderSdk.cs 94.73% <100.00%> (+6.89%) ⬆️
...y/Trace/Builder/TracerProviderBuilderExtensions.cs 100.00% <100.00%> (ø)
...elemetry/Trace/Builder/TracerProviderBuilderSdk.cs 96.66% <100.00%> (+0.23%) ⬆️
src/OpenTelemetry/Trace/TracerProviderSdk.cs 99.65% <100.00%> (+0.38%) ⬆️

... and 56 files with indirect coverage changes

@lmolkova lmolkova marked this pull request as ready for review February 22, 2024 03:21
@lmolkova lmolkova requested a review from a team February 22, 2024 03:21
Copy link
Contributor

github-actions bot commented Mar 1, 2024

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Mar 1, 2024
@lmolkova
Copy link
Author

lmolkova commented Mar 2, 2024

Thanks for the pointers to MetricsBuilder!

Currently it'd only work along with OpenTelemetry.Extensions.Hosting, but not in the vanilla OpenTelemetry package.
If we want to make IMetricsBuilder mechanism work with OpenTelemetry, it'd mean:

  1. taking dependency on Microsoft.Extensions.Diagnostics.Abstractions
  2. adding something like MeterProviderBuilder.SetMetricsBuilder(IMetricsBuilder) (or EnableMetrics|DisableMetrics equivalents).

Still, IMetricsBuilder does not provide the flexibility to filter meters/instruments based on version or scope attributes. It seems it's mostly intended for IConfiguration scenarios and not code-based configuration.


So, I'd like to get @open-telemetry/dotnet-maintainers / @open-telemetry/dotnet-approvers opinion on how they see more generic metric configuration in vanilla otel package:

  1. via IMetricsBuilder. Then I think this PR should not do anything. Allow/block-listing already works with something like
builder.Services.AddMetrics(b => b.EnableMetrics("Microsoft.")
                                  .DisableMetrics("Microsoft.AspNetCore.Routing"));
builder.Services.AddOpenTelemetry().WithMetrics(m => m.AddOtlpExporter());

Making IMetricsBuilder work directly with vanilla OTel doesn't need to happen here.

  1. Via AddMeter(Predicate) - happy to add it to the PR

I created dotnet/runtime#100317 to see if something like IMetricsBuilder can be supported for tracing and if AddSource<Predicate> might become redundant in the near future.

@github-actions github-actions bot removed the Stale Issues and pull requests which have been flagged for closing due to inactivity label Mar 2, 2024
@lmolkova
Copy link
Author

lmolkova commented Mar 4, 2024

A quick update on dotnet/runtime#100317: based on the offline conversation with @noahfalk there are no plans to provide IMetricsBuilder equivalent for tracing in .NET 9 and no other asks for it than this one

@lmolkova
Copy link
Author

lmolkova commented Mar 9, 2024

Capturing the discussion we had on 3/4 during SIG meeting:

  • we can proceed with experimental API for the time being
  • we want consistency with metrics, i.e. MeterProviderBuilder should get a similar extension method
  • we should consider similar AddView API as an example
    • and use similar precautions with callbacks (exception handling)
  • need to document what-ifs (ordering, etc)

@lmolkova lmolkova force-pushed the add-source-predicate branch from c7b13bd to fbf27ff Compare March 12, 2024 00:54
@lmolkova
Copy link
Author

lmolkova commented Mar 12, 2024

@open-telemetry/dotnet-maintainers / @open-telemetry/dotnet-approvers

thank you for the feedback, I believe I addressed it. Please take a look

Capturing the discussion we had on 3/4 during SIG meeting:

  • we can proceed with experimental API for the time being
  • we want consistency with metrics, i.e. MeterProviderBuilder should get a similar extension method
  • we should consider similar AddView API as an example
    • and use similar precautions with callbacks (exception handling)
  • need to document what-ifs (ordering, etc)

AddView method takes Func<Instrument, MetricStreamConfiguration> callback and MetricStreamConfiguration allows to drop, set cardinality, etc.

I don't envision a need to return a config. Given this API is experimental now I suggest that we don't block on this now, but will consider it if/when it goes stable.

@lmolkova lmolkova force-pushed the add-source-predicate branch from 6eea832 to c7bb924 Compare March 12, 2024 01:14
@lmolkova lmolkova force-pushed the add-source-predicate branch from c7bb924 to 91c5de1 Compare March 12, 2024 01:36
Copy link
Contributor

@utpilla utpilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should pursue this change for the following reasons:

  1. Selectively disabling tracers (ActivitySource) is a general problem that needs to be solved across all the SDK implementations and is not a .NET specific issue. The spec already has an issue that covers this problem. I don't see the need to get ahead of ourselves now when we can wait to see how spec decides to tackle this.
  2. I'm against adding experimental public APIs for which there is no end in sight as to when and how they get stabilized. I know that we can always remove experimental stuff but why create unnecessary maintenance burden in the first place?
  3. This is not significant enough to be rushed into considering the very small number of people that have complained about this. On top of that, it's not a blocker. AddSource method accepts an array of strings. Curating the list of ActivitySource to listen to is a one-time thing so it shouldn't be a recurring inconvenience. For more unusual scenarios such as if you are the library author and you keep updating the list of ActivitySource used in your library frequently, then you should consider providing a public API to make it easy for your users to curate the list of required ActivitySources which they can then simply pass to the AddSource method.

@martinjt
Copy link
Member

There's a couple of points I think are interesting here.

First, I get asked this all the time, so I don't think it's a small number of users. Not all users come onto github, are an internal Microsoft Department, or have a support contract with Microsoft. I don't think it's valid to dismiss this based on that argument.

Until recently, there hasn't been a tonne of libraries implementing multiple sources using namespaces. I've recently worked with 3 big OSS projects that have added them, and will likely be thinking about that as an approach. That said, they're only looking at 2/3 sources, so it's not a big thing for them.

The thing I get call for is "On by default, then disable". So what they essentially want is:

builder.Services.AddOpenTelemetry()
    .WithTracing(tpb => 
        tpb.AddSource("*")
             .RemoveSource("System.Data.SqlClient")
             )

That is a very common usecase, where people want "everything", but then to selectively turn it off. An example would be how the Node instrumentations library works.

I'm not sure predicate is the right way to do this, but adding a RemoveSource() method feels both consistent, and useful to a large number of users currently, and a growing number of users as the adoption increases.

@utpilla
Copy link
Contributor

utpilla commented Mar 18, 2024

First, I get asked this all the time, so I don't think it's a small number of users.

We prefer assessing the demand for something based on actual users upvoting/commenting on an issue rather than someone's personal accounts such as " I get asked this all the time".

Not all users come onto github, are an internal Microsoft Department, or have a support contract with Microsoft. I don't think it's valid to dismiss this based on that argument.

I'm not sure where you're getting this from. The reasons for blocking this PR are already mentioned in this comment and they apply to your proposal of adding RemoveSource as well.

@martinjt
Copy link
Member

99% of users will not come here to request features and comment on PRs. The sentiment comes from outreach and talking to users, something that I do all the time.

Ignoring what users are saying, and basing everything on upvotes here is not the way we build adoption.

The vast majority of features I see being added are actually from internal conversations that you're having with the runtime time or internal Microsoft teams, so it feels like you're already judging demand based on the conversations you and the rest of the team are having (such as the enhancements you're making to enable Aspire).

I get that you may not trust me as I'm not internal, but dismissing what I'm telling you I'm hearing from the community is utterly wrong.

@cijothomas
Copy link
Member

I get that you may not trust me as I'm not internal

I get that you may not trust me as I'm not internal

If you have any concerns about how the maintainers/approvers of this repo are acting, you can raise it with OpenTelemetry Governance Committee.
Such unnecessary comments here are not really required and is not productive.

@lmolkova
Copy link
Author

I hope you folks appreciated non-Microsoft contributors to this repo more than this, especially when they share user feedback.

image

@martinjt
Copy link
Member

Thanks @cijothomas, that is the route we're taking, this was the last piece of evidence I needed. Feel free to reach out if you want to discuss.

@cijothomas
Copy link
Member

Thanks @cijothomas, that is the route we're taking, this was the last piece of evidence I needed. Feel free to reach out if you want to discuss.

No, I don't have anything to discuss.

Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added Stale Issues and pull requests which have been flagged for closing due to inactivity and removed Stale Issues and pull requests which have been flagged for closing due to inactivity labels Mar 27, 2024
Copy link
Contributor

github-actions bot commented Apr 4, 2024

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added Stale Issues and pull requests which have been flagged for closing due to inactivity and removed Stale Issues and pull requests which have been flagged for closing due to inactivity labels Apr 4, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added Stale Issues and pull requests which have been flagged for closing due to inactivity and removed Stale Issues and pull requests which have been flagged for closing due to inactivity labels Apr 13, 2024
Copy link
Contributor

This PR was marked stale due to lack of activity and will be closed in 7 days. Commenting or Pushing will instruct the bot to automatically remove the label. This bot runs once per day.

@github-actions github-actions bot added the Stale Issues and pull requests which have been flagged for closing due to inactivity label Apr 22, 2024
@lmolkova lmolkova closed this Apr 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale Issues and pull requests which have been flagged for closing due to inactivity
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants