Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand the set of retryable exceptions in JdkHttpSender #5942

Merged

Conversation

jack-berg
Copy link
Member

@mikelaspina has been running a modified version of JdkHttpSender in production and has found some sharp edges with respect to which exceptions are considered retryable.

This PR expands the set to include all IOExceptions which are not instances of SSLException. Based off his experience running this in prod, all the IOExceptions we ran into qualify as retryable. They're also difficult to identify by their message so this PR opts out of specific exceptions instead of the existing behavior of opting in.

@jack-berg jack-berg requested a review from a team October 26, 2023 19:37
@codecov
Copy link

codecov bot commented Oct 26, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Files Coverage Δ
...ry/exporter/sender/jdk/internal/JdkHttpSender.java 89.38% <100.00%> (ø)

... and 2 files with indirect coverage changes

📢 Thoughts on this report? Let us know!.

Copy link

@mikelaspina mikelaspina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@jack-berg jack-berg added this to the 1.32.0 milestone Nov 3, 2023
// received"
// Known retryable HttpTimeoutException messages: "request timed out"
// Known retryable HttpConnectTimeoutException messages: "HTTP connect timed out"
return !(throwable instanceof SSLException);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should

static boolean isRetryableException(IOException e) {
if (!(e instanceof SocketTimeoutException)) {
return false;
}
String message = e.getMessage();
// Connect timeouts can produce SocketTimeoutExceptions with no message, or with "connect timed
// out"
return message == null || message.toLowerCase(Locale.ROOT).contains("connect timed out");
}

be also modified?

Copy link
Member Author

@jack-berg jack-berg Nov 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure the behavior of the JDK HttpClient is applicable to OkHttp client. I think in principle we should retry in all the same scenarios for both the JDK HttpClient and OkHttp, but not sure how each failure mode manifests as exceptions in OkHttp.

Copy link
Member

@trask trask left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any documented preference for err'ing on the side of at-most-once or at-least-once delivery?

@jack-berg
Copy link
Member Author

Yes there's this language:

In edge cases (e.g. on reconnections, network interruptions, etc) the client has no way of knowing if recently sent data was delivered if no acknowledgement was received yet. The client will typically choose to re-send such data to guarantee delivery, which may result in duplicate data on the server side. This is a deliberate choice and is considered to be the right tradeoff for telemetry data.

@jack-berg jack-berg merged commit f44d991 into open-telemetry:main Nov 10, 2023
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants