Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec Conformance Review: OTLP Exporter Retries #1632

Closed
1 of 2 tasks
MadVikingGod opened this issue Mar 2, 2021 · 6 comments · Fixed by #1832
Closed
1 of 2 tasks

Spec Conformance Review: OTLP Exporter Retries #1632

MadVikingGod opened this issue Mar 2, 2021 · 6 comments · Fixed by #1832
Assignees
Labels
pkg:exporter:otlp Related to the OTLP exporter package
Milestone

Comments

@MadVikingGod
Copy link
Contributor

MadVikingGod commented Mar 2, 2021

The OTLP exporter Must Retry on transient errors.

Acceptance Criteria:

@XSAM
Copy link
Member

XSAM commented Mar 3, 2021

Part of #1603

@MrAlias MrAlias added this to the RC1 milestone Mar 3, 2021
@MrAlias
Copy link
Contributor

MrAlias commented Mar 3, 2021

The gRPC driver for the OTLP exporter accepts a config that has a retry policy enabled by default:

// DefaultServiceConfig is the gRPC service config used if none is
// provided by the user.
//
// For more info on gRPC service configs:
// https://github.com/grpc/proposal/blob/master/A6-client-retries.md
//
// For more info on the RetryableStatusCodes we allow here:
// https://github.com/open-telemetry/oteps/blob/be2a3fcbaa417ebbf5845cd485d34fdf0ab4a2a4/text/0035-opentelemetry-protocol.md#export-response
//
// Note: MaxAttempts > 5 are treated as 5. See
// https://github.com/grpc/proposal/blob/master/A6-client-retries.md#validation-of-retrypolicy
// for more details.
DefaultServiceConfig = `{
"methodConfig":[{
"name":[
{ "service":"opentelemetry.proto.collector.metrics.v1.MetricsService" },
{ "service":"opentelemetry.proto.collector.trace.v1.TraceService" }
],
"retryPolicy":{
"MaxAttempts":5,
"InitialBackoff":"0.3s",
"MaxBackoff":"5s",
"BackoffMultiplier":2,
"RetryableStatusCodes":[
"CANCELLED",
"DEADLINE_EXCEEDED",
"RESOURCE_EXHAUSTED",
"ABORTED",
"OUT_OF_RANGE",
"UNAVAILABLE",
"DATA_LOSS"
]
}
}]
}`

This was added here for reference.

@MadVikingGod
Copy link
Contributor Author

Yes, but according to GRPC documents you can't enable retry except with an environment variable.

Unless we are going to rewrite the retry logic within our code, I don't think this is going to be fixed. We could maybe have some documentation that you actually get retries with GRPC_GO_RETRY=on.

@MadVikingGod
Copy link
Contributor Author

I did a bit of experimenting with it. Here is an example of a client not retrying, and how to get it to retry.
https://github.com/MadVikingGod/opentelemetry-go/tree/retry-example/exporters/otlp/example

@MrAlias
Copy link
Contributor

MrAlias commented Mar 4, 2021

Yes, but according to GRPC documents you can't enable retry except with an environment variable.

Ah, right, we ran into this on the old issues as well. One option that proposed was to copy the collector and write our own implementation: #561 (comment)

@MrAlias
Copy link
Contributor

MrAlias commented Apr 8, 2021

We may need to write our own back-off/retry algorithm (like the collector) instead of relying on the gRPC method.

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pkg:exporter:otlp Related to the OTLP exporter package
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants