-
Notifications
You must be signed in to change notification settings - Fork 772
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use OTLP push mode in Prometheus tutorial #4874
Use OTLP push mode in Prometheus tutorial #4874
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4874 +/- ##
==========================================
- Coverage 83.19% 83.18% -0.01%
==========================================
Files 293 293
Lines 11984 11984
==========================================
- Hits 9970 9969 -1
- Misses 2014 2015 +1
Flags with carried forward coverage won't be shown. Click here to find out more. |
I have pushed a change to update protocol to http protobuf. With that I can see the metrics in prometheus However, I still see that error message on console
I have tried myapp > otel-collector > prometheus(otlp endpoint) as well. That shows the same error. Looking in to this further to find the source of that error. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We can update other examples too (Separately) 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Kudos specifically to using console/OTLP exporters to explain the concept and good visual aids.
@@ -29,7 +29,12 @@ public static void Main() | |||
{ | |||
using var meterProvider = Sdk.CreateMeterProviderBuilder() | |||
.AddMeter("MyCompany.MyProduct.MyLibrary") | |||
.AddPrometheusHttpListener() | |||
.AddConsoleExporter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about eliminating the console exporter, and the steps of the tutorial that check the console? Partly I worry that people doing quick copy/paste will have an extra exporter they don't need, and partly checking the console makes the tutorial longer. It didn't feel like seeing the data in the console was adding much value vs. seeing it in Prometheus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default export interval is 60 seconds https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#periodic-exporting-metricreader.
I was considering the following scenarios:
- A user tried the example, had the application running for 20 seconds without seeing data, got impatient and gave up.
- A user ran into some issue while trying to tweak the code, metrics stopped flowing to Prometheus and the user ran out of idea how to troubleshoot.
I think the 1st one can be easily solved by explicitly specifying a shorter export period (e.g. 3 seconds). The problem is that we're recommending something that's against production practice.
The 2nd one is something I'm not sure, we can tell the user to enable internal troubleshooting logs, it's just much harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partly I worry that people doing quick copy/paste will have an extra exporter they don't need, and partly checking the console makes the tutorial longer.
Now that you've pointed it out, I've been thinking about this - how about that we just deprecate the stable version of Console Exporter from NuGet, and keep releasing it as alpha version? Or completely disable it if the application is running in Release mode (making it no-op), unless a special environment variable is specified?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A simpler version is here #4878.
(I convinced myself that users can follow the console tutorial so we don't need to duplicate the content here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts for issue (1) above
- Short term: I'd suggest leaving the update interval at 60 seconds in the initial snippet, calling out in the text that the updates will take some time, and then demonstrating in a separate tiny code snippet how the code could be changed to make it update faster.
- Long term: I feel like there is a chronic problem in telemetry systems to use high latency by default that is a difficult or impossible setting to change. High latency is presumably more cost-efficient for long-term high scale monitoring, but it is awful for anything interactive such as demos, testing telemetry data flow, or seeing the effect of changes during deployments and incident response. Ideally telemetry protocols, ingestion mechanisms and tools would collaborate so that some initial signal of connectivity occurs with extremely low latency. Even better, though I imagine technically more involved, would be allowing the end-user/tool to request lower latency transmission on-demand while interactive work is being done.
For (2) its probably useful for OTel to have a troubleshooting guide for how to handle telemetry that isn't reaching the intended destination. The tutorial could link to that guide.
how about that we just deprecate the stable version of Console Exporter from NuGet, and keep releasing it as alpha version? Or completely disable it if the application is running in Release mode (making it no-op), unless a special environment variable is specified?
Are you hearing from users that accidentally enabling the console exporter is a common problem? My guess is that as long as our samples don't encourage people to turn it on by default that would be sufficient protection and nothing more drastic would be needed. But if feedback showed that wasn't sufficient I'm not opposed to taking a stronger measure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For (2) its probably useful for OTel to have a troubleshooting guide for how to handle telemetry that isn't reaching the intended destination. The tutorial could link to that guide.
There is already a doc https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/src/OpenTelemetry#troubleshooting, I think the SDK troubleshooting story still has lots of room for improvements.
Are you hearing from users that accidentally enabling the console exporter is a common problem?
Not very often, but I've seen couple times developers left the Console Exporter enabled in production and I asked them to remove it.
Stepping back a little bit, I hope that we have one console exporter (including the Console logging provider) that is production ready, and it follows certain standards (so it can better integrate with existing systems such as Fluentbit / OTel Collector) instead of using our own taste while formatting the output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the SDK troubleshooting story still has lots of room for improvements.
Something is better than nothing, but agreed plenty of room for improvement. What I was envisioning would be a topic within the trouble-shooting guide that gets a little more specific such as: "I configured sending telemetry via OTLP, but the telemetry backend doesn't show any data has arrived." Then it would discuss how to get additional console/log data, what are signs the data is being sent, where would errors appear, and what are some common errors.
.AddOtlpExporter(options => | ||
{ | ||
options.Endpoint = new Uri("http://localhost:9090/api/v1/otlp/v1/metrics"); | ||
options.Protocol = OpenTelemetry.Exporter.OtlpExportProtocol.HttpProtobuf; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: A using statement above and not including the full namespace here would be a more typical style.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -71,7 +98,9 @@ end | |||
|
|||
Instrument --> | Measurements | MeterProvider | |||
|
|||
MeterProvider --> | Metrics | MetricReader --> | Pull | PrometheusHttpListener | |||
MeterProvider --> | Metrics | MetricReader --> | Push | OtlpExporter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking about MetricReader feels like a level of detail unneeded for the tutorial. I expect user's mental model is something like:
Recorded measurements -> OpenTelemetry -> OtlpExporter -> Remote Prometheus server
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See if this is better #4878.
Prometheus has added OTLP support https://prometheus.io/docs/prometheus/latest/feature_flags/#otlp-receiver.
Similar to #3940, this PR switches the tutorial from PrometheusExporter (PrometheusHttpListener to be precise) to OtlpExporter (which is stable).