Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix against retry logic #576

Merged
merged 3 commits into from
Feb 27, 2018

Conversation

debugthings
Copy link
Contributor

When reviewing this code to implement a few refactored items I noticed that there was still a retry hard code in the logic. This PR fixes that.

As it would be there are still a mix of tabs and spaces in the files that cause some additional lines for the commit that don't need to be there.

After adding the instant retry amount logic to the code this line of code could cause the transmissions to not back off.
{
long backOffSeconds = backOffMillis / 1000;
InternalLogger.INSTANCE.info("App is throttled, telemetry will be blocked for %s seconds.", backOffSeconds);
InternalLogger.INSTANCE.logAlways(InternalLogger.LoggingLevel.TRACE, "App is throttled, telemetry will be blocked for %s seconds.", backOffSeconds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this flood the log and make users unhappy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I think it has more consequences than just making user unhappy. Writing logs to disk (i.e file) or console is expensive application and during peak load there can be some sort of throttling in sending some telemetry but because of extensive logging this slow production application down and then can lead to missing important transactions (Say in high throughput e-commerce platform). This should not be log always.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @dhaval24, this shouldn't be logAlways. I would go with debug.

Copy link
Contributor

@grlima grlima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing...

Copy link
Contributor

@grlima grlima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review was for wrong PR. Sorry. Still reviewing this...

@@ -189,10 +189,8 @@ public boolean send(Transmission transmission) {
respString = EntityUtils.toString(respEntity);
retryAfterHeader = response.getFirstHeader(RESPONSE_THROTTLING_HEADER);

// After the third time through this dispatcher we should reset the counter and
// then fail to second TransmissionOutput
if (code > HttpStatus.SC_PARTIAL_CONTENT && transmission.getNumberOfSends() >= MAX_RESEND) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable MAX_RESEND is no longer needed

Copy link
Contributor

@dhaval24 dhaval24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@debugthings I have few recommendations on this one, if you can take a look.

{
long backOffSeconds = backOffMillis / 1000;
InternalLogger.INSTANCE.info("App is throttled, telemetry will be blocked for %s seconds.", backOffSeconds);
InternalLogger.INSTANCE.logAlways(InternalLogger.LoggingLevel.TRACE, "App is throttled, telemetry will be blocked for %s seconds.", backOffSeconds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well I think it has more consequences than just making user unhappy. Writing logs to disk (i.e file) or console is expensive application and during peak load there can be some sort of throttling in sending some telemetry but because of extensive logging this slow production application down and then can lead to missing important transactions (Say in high throughput e-commerce platform). This should not be log always.

return;
}

Date date = Calendar.getInstance().getTime();
date.setTime(date.getTime() + 1000 * suspendInSeconds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add explicit braces here. date.setTime(date.getTime() + (1000 * suspendInSeconds))
I know that multiplication takes precedence over addition but I would still prefer to be explicit to avoid any malfunctions in situations.

* Set the number of retries before performing a back off operation.
* @param maxInstantRetries Number of retries
*/
public void setMaxInstantRetries(int maxInstantRetries) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should always keep minimum instant retires to 3. Allowing instant retries to be set to 0 will again invite us for similar troubles in constrained networks. I would suggest that we change the condition to reflect that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instantRetries defaults to 3 if it is not set with this method, changed logic to only work if value is range [1..10]. I agree that 0 will put us in to a condition where we are backing off too soon, but disagree that we should always have this at 3 and give no option to lower.

Copy link
Contributor

@grlima grlima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go.

@grlima grlima merged commit 18b6fe2 into microsoft:2.0.0-STABLE Feb 27, 2018
littleaj added a commit that referenced this pull request Mar 8, 2018
* Fix null ref check in telemetry correlation Utils (#541)

* Fix null ref check in TelemetryCorrelationUtils

* Modifying log level to warning

* Updating Changelog

* Fix handling of NaN and +/-Infinity in JSON serializer (#499)

* Handle NaN and +/-Infinity in metrics

* Default NaN/Infinity serialization to 0 to be consistent with other AI SDKs and make the code compatible with Java 6

* fixed javadoc errors and added section to generate pom.xml with all builds (#551)

* Updating version number to 2.0.0

* Implementing Retry logic [Reliable Channel] [STABLE Branch] (#561)

* Initial commit of retry and backoff logic fixes

* Fixing warnings on files I touched this round

* Fix the eclipse UI from screaming about the docker Contstants

* Fixed backoff logic to use existing method. Added more logging to the sender channel.

* Added the partial response handler, more logging

* Added gson to core. Fixed backoff manager to keep original functionality. Added extension to return the timeout values as expected before.

* Added unit tests.

* Fixing string typed ArrayList<> to List<> per Dhaval

* Missed one

* Making tests consistent.

* Added javadoc comments, simplified logic for a few methods

* Added exception logging per @dhaval24. Fixed formatting on touched files

* Updates per last round of commits

Moved the Handlers out of the concrete package to the common package to keep the same consistency.  Removed a couple of unessecary methods. Added docs.

* Latest fixes

* Add MaxInstantRetry

Added MaxInstantRetry configuration to allow for instantaneous retry on a failed transmission.

* Javadoc Updates

Javadoc and formatting updates

* NumberFormatException fix

Added null check

* JavaDocs for TPM

* Fixing FixedRateSampling to work in old and new version of sampling (#540)

Overriding default sampling percentage when programatically specified sampling percentage by user.

* upgrade to logback v1.2.3 (#565)

* Reliable channel: replacing logAlways "TRACE" statements with "info" (#571)

* Reliable channel: close resources in finally block. (#572)

* Reliable channel: close resources in finally block.

* change logging to warning when closing resources

* Bugfix against retry logic (#576)

* Refactor

* BUGFIX Logic would never backoff

After adding the instant retry amount logic to the code this line of code could cause the transmissions to not back off.

* Changes requested

* Fixed javadocs tags, that caused build errors while executing `javadoc` gradle task (#578)

* Update Changelog

* Fix link in changelog

* Fix another link in changelog

* Update gradle.properties

* Fix customizing pom.xml in Gradle build (#582)

* Fix customizing pom.xml in Gradle build

* Insert license after 1. row in pom.xml

* Filter artifacts relocated by shadow task from pom dependencies

- match artifacts by groupId
- fixes #583 

* Generate a pom file "beside" the artifact jar file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants