-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix against retry logic #576
Bugfix against retry logic #576
Conversation
After adding the instant retry amount logic to the code this line of code could cause the transmissions to not back off.
{ | ||
long backOffSeconds = backOffMillis / 1000; | ||
InternalLogger.INSTANCE.info("App is throttled, telemetry will be blocked for %s seconds.", backOffSeconds); | ||
InternalLogger.INSTANCE.logAlways(InternalLogger.LoggingLevel.TRACE, "App is throttled, telemetry will be blocked for %s seconds.", backOffSeconds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this flood the log and make users unhappy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I think it has more consequences than just making user unhappy. Writing logs to disk (i.e file) or console is expensive application and during peak load there can be some sort of throttling in sending some telemetry but because of extensive logging this slow production application down and then can lead to missing important transactions (Say in high throughput e-commerce platform). This should not be log always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @dhaval24, this shouldn't be logAlways
. I would go with debug
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous review was for wrong PR. Sorry. Still reviewing this...
@@ -189,10 +189,8 @@ public boolean send(Transmission transmission) { | |||
respString = EntityUtils.toString(respEntity); | |||
retryAfterHeader = response.getFirstHeader(RESPONSE_THROTTLING_HEADER); | |||
|
|||
// After the third time through this dispatcher we should reset the counter and | |||
// then fail to second TransmissionOutput | |||
if (code > HttpStatus.SC_PARTIAL_CONTENT && transmission.getNumberOfSends() >= MAX_RESEND) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable MAX_RESEND is no longer needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@debugthings I have few recommendations on this one, if you can take a look.
{ | ||
long backOffSeconds = backOffMillis / 1000; | ||
InternalLogger.INSTANCE.info("App is throttled, telemetry will be blocked for %s seconds.", backOffSeconds); | ||
InternalLogger.INSTANCE.logAlways(InternalLogger.LoggingLevel.TRACE, "App is throttled, telemetry will be blocked for %s seconds.", backOffSeconds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well I think it has more consequences than just making user unhappy. Writing logs to disk (i.e file) or console is expensive application and during peak load there can be some sort of throttling in sending some telemetry but because of extensive logging this slow production application down and then can lead to missing important transactions (Say in high throughput e-commerce platform). This should not be log always.
return; | ||
} | ||
|
||
Date date = Calendar.getInstance().getTime(); | ||
date.setTime(date.getTime() + 1000 * suspendInSeconds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add explicit braces here. date.setTime(date.getTime() + (1000 * suspendInSeconds))
I know that multiplication takes precedence over addition but I would still prefer to be explicit to avoid any malfunctions in situations.
* Set the number of retries before performing a back off operation. | ||
* @param maxInstantRetries Number of retries | ||
*/ | ||
public void setMaxInstantRetries(int maxInstantRetries) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should always keep minimum instant retires to 3. Allowing instant retries to be set to 0 will again invite us for similar troubles in constrained networks. I would suggest that we change the condition to reflect that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instantRetries defaults to 3 if it is not set with this method, changed logic to only work if value is range [1..10]. I agree that 0 will put us in to a condition where we are backing off too soon, but disagree that we should always have this at 3 and give no option to lower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to go.
* Fix null ref check in telemetry correlation Utils (#541) * Fix null ref check in TelemetryCorrelationUtils * Modifying log level to warning * Updating Changelog * Fix handling of NaN and +/-Infinity in JSON serializer (#499) * Handle NaN and +/-Infinity in metrics * Default NaN/Infinity serialization to 0 to be consistent with other AI SDKs and make the code compatible with Java 6 * fixed javadoc errors and added section to generate pom.xml with all builds (#551) * Updating version number to 2.0.0 * Implementing Retry logic [Reliable Channel] [STABLE Branch] (#561) * Initial commit of retry and backoff logic fixes * Fixing warnings on files I touched this round * Fix the eclipse UI from screaming about the docker Contstants * Fixed backoff logic to use existing method. Added more logging to the sender channel. * Added the partial response handler, more logging * Added gson to core. Fixed backoff manager to keep original functionality. Added extension to return the timeout values as expected before. * Added unit tests. * Fixing string typed ArrayList<> to List<> per Dhaval * Missed one * Making tests consistent. * Added javadoc comments, simplified logic for a few methods * Added exception logging per @dhaval24. Fixed formatting on touched files * Updates per last round of commits Moved the Handlers out of the concrete package to the common package to keep the same consistency. Removed a couple of unessecary methods. Added docs. * Latest fixes * Add MaxInstantRetry Added MaxInstantRetry configuration to allow for instantaneous retry on a failed transmission. * Javadoc Updates Javadoc and formatting updates * NumberFormatException fix Added null check * JavaDocs for TPM * Fixing FixedRateSampling to work in old and new version of sampling (#540) Overriding default sampling percentage when programatically specified sampling percentage by user. * upgrade to logback v1.2.3 (#565) * Reliable channel: replacing logAlways "TRACE" statements with "info" (#571) * Reliable channel: close resources in finally block. (#572) * Reliable channel: close resources in finally block. * change logging to warning when closing resources * Bugfix against retry logic (#576) * Refactor * BUGFIX Logic would never backoff After adding the instant retry amount logic to the code this line of code could cause the transmissions to not back off. * Changes requested * Fixed javadocs tags, that caused build errors while executing `javadoc` gradle task (#578) * Update Changelog * Fix link in changelog * Fix another link in changelog * Update gradle.properties * Fix customizing pom.xml in Gradle build (#582) * Fix customizing pom.xml in Gradle build * Insert license after 1. row in pom.xml * Filter artifacts relocated by shadow task from pom dependencies - match artifacts by groupId - fixes #583 * Generate a pom file "beside" the artifact jar file
When reviewing this code to implement a few refactored items I noticed that there was still a retry hard code in the logic. This PR fixes that.
As it would be there are still a mix of tabs and spaces in the files that cause some additional lines for the commit that don't need to be there.