Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid warning on sporadic connection failures #39021

Merged
merged 7 commits into from
Mar 6, 2024

Conversation

jeanbisutti
Copy link
Contributor

@jeanbisutti jeanbisutti commented Feb 29, 2024

Sometimes, the OTel exporter fails to send the telemetry data because of a time out:

WARN  c.a.m.o.e.i.p.TelemetryPipeline - Sending telemetry to the ingestion service: Channel response timed out after 60000 milliseconds. (...) (future warnings will be aggregated and logged once every 5 minutes)
java.util.concurrent.TimeoutException: Channel response timed out after 60000 milliseconds.
        at com.azure.core.http.netty.implementation.ResponseTimeoutHandler.responseTimedOut(ResponseTimeoutHandler.java:58)
        at com.azure.core.http.netty.implementation.ResponseTimeoutHandler.lambda$handlerAdded$0(ResponseTimeoutHandler.java:45)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
        at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
WARN  c.a.m.o.e.i.p.TelemetryPipeline - In the last 5 minutes, the following operation has failed 1 times (out of 30): Sending telemetry to the ingestion service:
 * finishConnect(..) failed: Connection refused: (1 times)

It's not really an issue if it happens from time to time because the OTel exporter will save the telemetry data on the disk and will resend them later.

@github-actions github-actions bot added the OpenTelemetry OpenTelemetry instrumentation label Feb 29, 2024
@@ -26,7 +24,7 @@ public static TelemetryItemExporter createTelemetryItemExporter(HttpPipeline htt
if (tempDir == null) {
telemetryPipelineListener =
new DiagnosticTelemetryPipelineListener(
"Sending telemetry to the ingestion service", false, " (telemetry will be lost)");
"Sending telemetry to the ingestion service", false, true, " (telemetry will be lost)");
Copy link
Contributor Author

@jeanbisutti jeanbisutti Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No retry in this case, so exceptions should be logged

@@ -41,6 +41,7 @@ class LocalFileSender implements Runnable {
new DiagnosticTelemetryPipelineListener(
"Sending telemetry to the ingestion service (retry from disk)",
true,
true,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log exceptions on retry

@@ -101,7 +101,7 @@ public void onResponse(TelemetryPipelineRequest request, TelemetryPipelineRespon
public void onException(TelemetryPipelineRequest request, String reason, Throwable throwable) {
if (!NetworkFriendlyExceptions.logSpecialOneTimeFriendlyException(
throwable, request.getUrl().toString(), friendlyExceptionThrown, logger)) {
if (logRetryableFailures) {
if (logOnException) {
Copy link
Contributor Author

@jeanbisutti jeanbisutti Feb 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If logOnException is false, then OperationLogger won't record a failure.

Then, AggregatingLogger won't record a warning:

Then,

WARN  c.a.m.o.e.i.p.TelemetryPipeline - Sending telemetry to the ingestion service: Channel response timed out after 60000 milliseconds. (...) (future warnings will be aggregated and logged once every 5 minutes)
java.util.concurrent.TimeoutException: Channel response timed out after 60000 milliseconds.
        at com.azure.core.http.netty.implementation.ResponseTimeoutHandler.responseTimedOut(ResponseTimeoutHandler.java:58)
        at com.azure.core.http.netty.implementation.ResponseTimeoutHandler.lambda$handlerAdded$0(ResponseTimeoutHandler.java:45)
        at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98)
        at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153)
        at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470)
        at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:403)
        at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)

won't be displayed:

And

WARN  c.a.m.o.e.i.p.TelemetryPipeline - In the last 5 minutes, the following operation has failed 1 times (out of 30): Sending telemetry to the ingestion service:
 * finishConnect(..) failed: Connection refused: (1 times)

won't be displayed:

Copy link
Contributor Author

@jeanbisutti jeanbisutti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code processing the Breeze response has not changed:

@trask
Copy link
Member

trask commented Mar 1, 2024

IIRC, microsoft/ApplicationInsights-Java#3564 should fix for Java agent? (and we could make similar change in standalone exporter)

@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

@jeanbisutti
Copy link
Contributor Author

IIRC, microsoft/ApplicationInsights-Java#3564 should fix for Java agent? (and we could make similar change in standalone exporter)

So, the Java agent has his own code for this.

I see that your preference is also to disable the logs related to the Breeze response processing (when a retry is possible). I am fine with that and has updated this PR in this way.

@trask @heyam Please have another look.

…/com/azure/monitor/opentelemetry/exporter/implementation/utils/AzureMonitorHelper.java

Co-authored-by: Trask Stalnaker <trask.stalnaker@gmail.com>
@jeanbisutti jeanbisutti requested a review from trask March 5, 2024 11:02
@jeanbisutti jeanbisutti merged commit fad02b4 into main Mar 6, 2024
19 checks passed
@jeanbisutti jeanbisutti deleted the avoid-warning-on-sporadic-connection-failures branch March 6, 2024 13:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OpenTelemetry OpenTelemetry instrumentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants