-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[exporter/otlp] Retry RESOURCE_EXHAUSTED only if the server returns RetryInfo #5147
[exporter/otlp] Retry RESOURCE_EXHAUSTED only if the server returns RetryInfo #5147
Conversation
Codecov Report
@@ Coverage Diff @@
## main #5147 +/- ##
==========================================
+ Coverage 90.52% 90.65% +0.12%
==========================================
Files 187 187
Lines 11041 11052 +11
==========================================
+ Hits 9995 10019 +24
+ Misses 824 814 -10
+ Partials 222 219 -3
Continue to review full report at Codecov.
|
Can we instead change that case to determine whether the retry info has been provided in the response? The spec requires retry of transient errors and says that |
@Aneurysm9 Makes sense, gonna work on this. |
afd826c
to
02cc1e1
Compare
@Aneurysm9 I updated the PR to separate "always-retryable" and "never-retryable" error codes. Does it makes sense? Meanwhile I am going to write some tests for that as those seem to be missing (which turned out to be tricky). |
This seems like a good approach. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand what this change does. A couple questions:
- Is the OTLP specification correct or something needs to be changed there?
- If the specification is correct which part of the specification is currently violated by the implementation in this repo that this PR fixes?
(Blocking temporarily to avoid accidental merging until all is clarified).
@tigrannajaryan This implementation effectively treats It doesn't violate the spec as this only modifies the client behavior to drop data immediately if |
This does appear to contradict what the spec says:
So, to make it clear, you are saying that RESOURCE_EXHAUSTED should not be an unconditional "Yes" in that table. It should be conditional:
Is this correct? |
@tigrannajaryan Exactly. |
So, I think we need to update spec to reflect this. I believe this can be considered a bug in the specification, so it is an allowed change. |
@tigrannajaryan Should I prepare the change to the spec before proceeding with this? |
Yes, that would be great. |
02cc1e1
to
fea9bf6
Compare
I made the code follow the new addition to the spec and added the test to check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better now, thanks.
Let's wait for open-telemetry/opentelemetry-specification#2480 to be merged before merging this.
4cc4819
to
3acf067
Compare
@Aneurysm9 @bogdandrutu any further comments or good to merge? |
@svrakitin please resolve conflicts, and I will merge |
@svrakitin unit tests are failing |
@bogdandrutu Yeah, I didn't notice that the tests now use |
…etryInfo (open-telemetry#5147) This makes us retry on`ResourceExhausted` only if retry info is provided. It also makes code a bit more explicit. In my case it caused an issue in production where upstream denied requests above `max_recv_msg_size` and we kept retrying. **Link to tracking Issue:** open-telemetry#1123
…eturns RetryInfo (open-telemetry#5147)" This reverts commit cfc536c.
…server returns RetryInfo (open-telemetry#5147)"" This reverts commit c981ea7.
Description:
This makes us retry on
ResourceExhausted
only if retry info is provided. It also makes code a bit more explicit.In my case it caused an issue in production where upstream denied requests above
max_recv_msg_size
and we kept retrying.Link to tracking Issue:
#1123