Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use OTLP push mode in Prometheus tutorial #4874

Merged
merged 6 commits into from
Sep 22, 2023

Conversation

reyang
Copy link
Member

@reyang reyang commented Sep 21, 2023

Prometheus has added OTLP support https://prometheus.io/docs/prometheus/latest/feature_flags/#otlp-receiver.
Similar to #3940, this PR switches the tutorial from PrometheusExporter (PrometheusHttpListener to be precise) to OtlpExporter (which is stable).

@reyang
Copy link
Member Author

reyang commented Sep 21, 2023

I hit an error saying OTLP Exporter is using snappy compression while Prometheus only supports gzip

image

@codecov
Copy link

codecov bot commented Sep 21, 2023

Codecov Report

Merging #4874 (2eaec64) into main (535c819) will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #4874      +/-   ##
==========================================
- Coverage   83.19%   83.18%   -0.01%     
==========================================
  Files         293      293              
  Lines       11984    11984              
==========================================
- Hits         9970     9969       -1     
- Misses       2014     2015       +1     
Flag Coverage Δ
unittests 83.18% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@cijothomas
Copy link
Member

I hit an error saying OTLP Exporter is using snappy compression while Prometheus only supports gzip

image

Let me try another language OTLP Exporter to see if it works...

@vishweshbankwar
Copy link
Member

I have pushed a change to update protocol to http protobuf. With that I can see the metrics in prometheus
image

However, I still see that error message on console

ts=2023-09-22T16:42:44.685Z caller=write_handler.go:205 level=error component=web msg="Error decoding remote write request" err="unsupported compression: snappy. Only "gzip" or no compression supported"
ts=2023-09-22T16:42:44.686Z caller=dedupe.go:112 component=remote level=error remote_name=5653f3 url=http://localhost:9090/api/v1/otlp/v1/metrics msg="non-recoverable error" count=4 exemplarCount=0 err="server returned HTTP status 400 Bad Request: unsupported compression: snappy. Only "gzip" or no compression supported"

I have tried myapp > otel-collector > prometheus(otlp endpoint) as well. That shows the same error. Looking in to this further to find the source of that error.

@reyang reyang marked this pull request as ready for review September 22, 2023 21:03
@reyang reyang requested a review from a team September 22, 2023 21:03
Copy link
Member

@vishweshbankwar vishweshbankwar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We can update other examples too (Separately) 🎉

Copy link
Contributor

@Yun-Ting Yun-Ting left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Kudos specifically to using console/OTLP exporters to explain the concept and good visual aids.

docs/metrics/getting-started-prometheus-grafana/README.md Outdated Show resolved Hide resolved
@CodeBlanch CodeBlanch merged commit f468dd8 into open-telemetry:main Sep 22, 2023
49 checks passed
@@ -29,7 +29,12 @@ public static void Main()
{
using var meterProvider = Sdk.CreateMeterProviderBuilder()
.AddMeter("MyCompany.MyProduct.MyLibrary")
.AddPrometheusHttpListener()
.AddConsoleExporter()
Copy link

@noahfalk noahfalk Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about eliminating the console exporter, and the steps of the tutorial that check the console? Partly I worry that people doing quick copy/paste will have an extra exporter they don't need, and partly checking the console makes the tutorial longer. It didn't feel like seeing the data in the console was adding much value vs. seeing it in Prometheus.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default export interval is 60 seconds https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md#periodic-exporting-metricreader.

I was considering the following scenarios:

  1. A user tried the example, had the application running for 20 seconds without seeing data, got impatient and gave up.
  2. A user ran into some issue while trying to tweak the code, metrics stopped flowing to Prometheus and the user ran out of idea how to troubleshoot.

I think the 1st one can be easily solved by explicitly specifying a shorter export period (e.g. 3 seconds). The problem is that we're recommending something that's against production practice.
The 2nd one is something I'm not sure, we can tell the user to enable internal troubleshooting logs, it's just much harder.

Copy link
Member Author

@reyang reyang Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partly I worry that people doing quick copy/paste will have an extra exporter they don't need, and partly checking the console makes the tutorial longer.

Now that you've pointed it out, I've been thinking about this - how about that we just deprecate the stable version of Console Exporter from NuGet, and keep releasing it as alpha version? Or completely disable it if the application is running in Release mode (making it no-op), unless a special environment variable is specified?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler version is here #4878.
(I convinced myself that users can follow the console tutorial so we don't need to duplicate the content here)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts for issue (1) above

  • Short term: I'd suggest leaving the update interval at 60 seconds in the initial snippet, calling out in the text that the updates will take some time, and then demonstrating in a separate tiny code snippet how the code could be changed to make it update faster.
  • Long term: I feel like there is a chronic problem in telemetry systems to use high latency by default that is a difficult or impossible setting to change. High latency is presumably more cost-efficient for long-term high scale monitoring, but it is awful for anything interactive such as demos, testing telemetry data flow, or seeing the effect of changes during deployments and incident response. Ideally telemetry protocols, ingestion mechanisms and tools would collaborate so that some initial signal of connectivity occurs with extremely low latency. Even better, though I imagine technically more involved, would be allowing the end-user/tool to request lower latency transmission on-demand while interactive work is being done.

For (2) its probably useful for OTel to have a troubleshooting guide for how to handle telemetry that isn't reaching the intended destination. The tutorial could link to that guide.

how about that we just deprecate the stable version of Console Exporter from NuGet, and keep releasing it as alpha version? Or completely disable it if the application is running in Release mode (making it no-op), unless a special environment variable is specified?

Are you hearing from users that accidentally enabling the console exporter is a common problem? My guess is that as long as our samples don't encourage people to turn it on by default that would be sufficient protection and nothing more drastic would be needed. But if feedback showed that wasn't sufficient I'm not opposed to taking a stronger measure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For (2) its probably useful for OTel to have a troubleshooting guide for how to handle telemetry that isn't reaching the intended destination. The tutorial could link to that guide.

There is already a doc https://github.com/open-telemetry/opentelemetry-dotnet/tree/main/src/OpenTelemetry#troubleshooting, I think the SDK troubleshooting story still has lots of room for improvements.

Are you hearing from users that accidentally enabling the console exporter is a common problem?

Not very often, but I've seen couple times developers left the Console Exporter enabled in production and I asked them to remove it.
Stepping back a little bit, I hope that we have one console exporter (including the Console logging provider) that is production ready, and it follows certain standards (so it can better integrate with existing systems such as Fluentbit / OTel Collector) instead of using our own taste while formatting the output.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the SDK troubleshooting story still has lots of room for improvements.

Something is better than nothing, but agreed plenty of room for improvement. What I was envisioning would be a topic within the trouble-shooting guide that gets a little more specific such as: "I configured sending telemetry via OTLP, but the telemetry backend doesn't show any data has arrived." Then it would discuss how to get additional console/log data, what are signs the data is being sent, where would errors appear, and what are some common errors.

@reyang reyang deleted the reyang/prometheus-otlp branch September 22, 2023 22:34
.AddOtlpExporter(options =>
{
options.Endpoint = new Uri("http://localhost:9090/api/v1/otlp/v1/metrics");
options.Protocol = OpenTelemetry.Exporter.OtlpExportProtocol.HttpProtobuf;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: A using statement above and not including the full namespace here would be a more typical style.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -71,7 +98,9 @@ end

Instrument --> | Measurements | MeterProvider

MeterProvider --> | Metrics | MetricReader --> | Pull | PrometheusHttpListener
MeterProvider --> | Metrics | MetricReader --> | Push | OtlpExporter

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talking about MetricReader feels like a level of detail unneeded for the tutorial. I expect user's mental model is something like:
Recorded measurements -> OpenTelemetry -> OtlpExporter -> Remote Prometheus server

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See if this is better #4878.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants