Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add initial experimental .NET CLR runtime metrics #1035

Merged
merged 36 commits into from
Aug 23, 2024

Conversation

stevejgordon
Copy link
Contributor

@stevejgordon stevejgordon commented May 14, 2024

Fixes #956

Changes

Adds proposed experimental .NET CLR runtime metrics to the semantic conventions. Based on discussions with the .NET runtime team, the implementation plan will be to port the existing metrics from OpenTelemetry.Instrumentation.Runtime as directly as possible into the runtime itself. The names have been modified to align with the runtime environment metrics conventions.

Merge requirement checklist

@stevejgordon stevejgordon requested review from a team May 14, 2024 10:23
@stevejgordon
Copy link
Contributor Author

I'm not entirely sure why markdownlint fails on the autogenerated tables.

@trisch-me
Copy link
Contributor

Have you run both attribute-registry-generation and table-generation?

@stevejgordon
Copy link
Contributor Author

Have you run both attribute-registry-generation and table-generation?

I thought I had done so as the contents were updated, but I can try those again. I'm having to hack around the tooling a bit to get things running on Windows.

@stevejgordon
Copy link
Contributor Author

@trisch-me I've updated the formatting per your suggestion and rerun both of those make targets. Neither made any changes to the markdown files.

@trisch-me
Copy link
Contributor

Alternatively you could wait until #1000 will be merged, it has fix for this bug

@trisch-me
Copy link
Contributor

@stevejgordon The #1000 is merged so please re-run code generation locally and update your files. Thanks

@stevejgordon
Copy link
Contributor Author

Thanks, @trisch-me. Looks good!

Copy link
Member

@gregkalapos gregkalapos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to see this 🎉

Left some comments below.

model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
Copy link

linux-foundation-easycla bot commented May 17, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

Copy link
Contributor

@lmolkova lmolkova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this!

The main concerns from my side:

  • it seems we're designing 'proper' CLR metrics based on the information we can get today, but native runtime instrumentation can do much better and provide:
    • GC duration histogram
    • lock contention duration
    • ...
  • I'm no expert in all of the runtime details, but I assume that some metrics are used much more that others (e.g. CPU, GC, heap size, thread pool) while things like JIT metrics could be more advanced and specialized. I wonder if it's possible to start with basic-perf analysis (e.g. CPU, memory/GC) and then move on to more specific metrics in a follow up PRs?

.chloggen/clr-runtime.yaml Outdated Show resolved Hide resolved
model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
model/metrics/clr-metrics.yaml Outdated Show resolved Hide resolved
@trask
Copy link
Member

trask commented May 28, 2024

/easycla

@lmolkova
Copy link
Contributor

lmolkova commented Jun 3, 2024

Looking at the discussions in this PR, I want to reiterate the proposal #1035 (review):

Let's think about user experience - which metrics users want to see first - my assumption is CPU, memory (from all sources, ideally in one/few metrics, GC, maybe threads. Let try to come up with a few metrics that would not require users to have a deep prior expertise in .NET memory management or know subtle differences between different .NET flavors.

If we need some advanced, precise things, they should come as an addition to basic CPU/mem/GC things.
Let's focus on basics first.

@noahfalk
Copy link
Contributor

noahfalk commented Jun 3, 2024

If we need some advanced, precise things, they should come as an addition to basic CPU/mem/GC things.
Let's focus on basics first.

Are you suggesting lets review some things first and some things later? Or do you mean "Lets see if we can avoid including some of these metrics in .NET 9 at all?" If its just review ordering I'd say no problem. If the goal is to have these metrics not contain all the metrics in the current OTel .NET runtime instrumentation I think that leads to trouble. One of the major scenarios will be people migrating from using older metrics to these metrics and every metric we remove makes that migration harder or discourages them from migrating at all. If we want to have a simple set of metrics for folks just getting started I think a good approach to that will be docs or a pre-made dashboard, not by excluding metrics from the underlying instrumentation.

EDIT: Just to add I know a few of the metrics may feel a bit advanced or niche, but they are there because customer feedback wanted them to be there. We took a bunch of things out during the move from Windows Performance Counters -> EventCounters, but customers gave us feedback that we had cut too deep and please restore some of the metrics that were important to them.

Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Aug 23, 2024
@lmolkova
Copy link
Contributor

I took a liberty to resolve last two discussions, took @noahfalk suggestion on one of them to match what's documented in runtime and regenerated tables.

With this I believe this is ready to go.

@lmolkova lmolkova merged commit d04bc0d into open-telemetry:main Aug 23, 2024
13 of 14 checks passed
@stevejgordon stevejgordon deleted the dotnet-runtime-metrics branch August 23, 2024 09:22
stevejgordon added a commit to elastic/kibana that referenced this pull request Nov 4, 2024
## Summary

Create a dedicated "portable dashboard" for OTel .NET.

This uses metrics available in the
[contrib](https://github.com/open-telemetry/opentelemetry-dotnet-contrib)
runtime metrics library. These metrics are opt-in and not enabled by
default in the vanilla SDK. Our Elastic distro brings in the package and
enables them by default. Therefore, the dashboard will only work if a)
the customer uses our distro or b) they enable the metrics themselves
when using the vanilla SDK.

Further, work is ongoing to define [semantic conventions for .NET
runtime
metrics](open-telemetry/semantic-conventions#1035).
Once complete, the metrics will be implemented directly in the .NET
runtime BCL and be available with no additional dependencies. The goal
is to achieve that by .NET 9, which is not guaranteed. At that point,
the metric names will change to align with the semantic conventions.
This is not ideal, but it is our only option if we want to provide some
form of runtime dashboard with the current metrics and OTel distro.

As with #182107, this dashboard uses a table for some of the data and
this table doesn't seem to reflect the correct date filtering. Until
there is a solution, this PR will remain in the draft, or we can
consider dropping the table for the initial dashboard.


![image](https://github.com/elastic/kibana/assets/3669103/0be46495-e09f-4f4e-81e1-5f69361d5781)
kibanamachine pushed a commit to kibanamachine/kibana that referenced this pull request Nov 4, 2024
## Summary

Create a dedicated "portable dashboard" for OTel .NET.

This uses metrics available in the
[contrib](https://github.com/open-telemetry/opentelemetry-dotnet-contrib)
runtime metrics library. These metrics are opt-in and not enabled by
default in the vanilla SDK. Our Elastic distro brings in the package and
enables them by default. Therefore, the dashboard will only work if a)
the customer uses our distro or b) they enable the metrics themselves
when using the vanilla SDK.

Further, work is ongoing to define [semantic conventions for .NET
runtime
metrics](open-telemetry/semantic-conventions#1035).
Once complete, the metrics will be implemented directly in the .NET
runtime BCL and be available with no additional dependencies. The goal
is to achieve that by .NET 9, which is not guaranteed. At that point,
the metric names will change to align with the semantic conventions.
This is not ideal, but it is our only option if we want to provide some
form of runtime dashboard with the current metrics and OTel distro.

As with elastic#182107, this dashboard uses a table for some of the data and
this table doesn't seem to reflect the correct date filtering. Until
there is a solution, this PR will remain in the draft, or we can
consider dropping the table for the initial dashboard.

![image](https://github.com/elastic/kibana/assets/3669103/0be46495-e09f-4f4e-81e1-5f69361d5781)

(cherry picked from commit 0600309)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

Create CLR metrics semantic convention