Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing Retry logic [Reliable Channel] [STABLE Branch] #561

Merged
merged 18 commits into from
Feb 16, 2018

Conversation

debugthings
Copy link
Contributor

Completion of PR #556 into Stable Branch

Added retry logic that is similar to the .NET framework.

  • Removes logic from NetworkTransmissionOutput
  • Uses observers to handle error conditions
  • Added failed transmission retry
  • Added partial transmission retry
  • Aligned response codes to match .NET
  • Uses existing ExponentialBackOffTimesPolicy

NOTE This clobbers the change in #385 referenced here

https://github.com/Microsoft/ApplicationInsights-Java/blob/7a5e572b2ffd3256898c6bd05dc3bec6754e3629/core/src/main/java/com/microsoft/applicationinsights/internal/channel/common/TransmissionNetworkOutput.java#L172

All exceptions are now handled by ErrorHandler in this line.

https://github.com/Microsoft/ApplicationInsights-Java/blob/094383a25b0f98fe6aa2dd6dd58bae10259492e6/core/src/main/java/com/microsoft/applicationinsights/channel/concrete/inprocess/ErrorHandler.java#L39-L42

For significant contributions please make sure you have completed the following items:

  • Design discussion issue #
  • Changes in public surface reviewed
  • CHANGELOG.md updated

throttling = Boolean.valueOf(namesAndValues.get("Throttling"));
developerMode = Boolean.valueOf(namesAndValues.get(DEVELOPER_MODE_NAME));
try {
maxInstantRetries = Integer.parseInt(INSTANT_RETRY_NAME);
Copy link
Contributor

@grlima grlima Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will throw. Shouldn't it be Integer.parseInt(namesAndValues.get(INSTANT_RETRY_NAME)); ?

Copy link
Contributor

@dhaval24 dhaval24 Feb 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good catch. This will blow. But I am not quite sure, it seems that this value comes as string from XML. In case of user specifies wrong value which cannot be converted to string then this would throw. Let's have try catch around this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes very good catch. Fixed and added null check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer.parseInt does throw NumberFormatException which I already had a catch for. It should only log if we have ERROR tracing on.

@dhaval24
Copy link
Contributor

@debugthings just wanted to understand did you changed / added anything into this PR which is different from the previous one which we reviewed extensively? Do I need to go through this PR again?


private final static String INSTANT_RETRY_NAME = "MaxInstantRetry";
private final static int DEFAULT_MAX_INSTANT_RETRY = 3;
private final static int DEFAULT_MAX_TELEMETRY_BUFFER_CAPACITY = 500;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This number also seems low to me. I think default max capacity can be double

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same as before, so we should not change it in this PR.

private final static int DEFAULT_MAX_INSTANT_RETRY = 3;
private final static int DEFAULT_MAX_TELEMETRY_BUFFER_CAPACITY = 500;
private final static int MIN_MAX_TELEMETRY_BUFFER_CAPACITY = 1;
private final static int MAX_MAX_TELEMETRY_BUFFER_CAPACITY = 1000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also definitely less. I think this should be definitely increased.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the same as before, so we should not change it in this PR.

telemetry.getContext().getProperties().put("DeveloperMode", "true");
}

if (telemetrySampler != null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just kept for maintaining backward compatibility with Sampling V1? Nonetheless, with this major version we should definitely scrap off the old sampling algorithm as it uses user id based sampling which is not desirable for sure.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhaval24 - Again, nothing new here. The same code was there before. We can address unrelated changes as separate PR's.

* - The sampler
*/
@Override
public void setSampler(TelemetrySampler telemetrySampler) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be deprecated. It is the old sampling strategy which we do not encourage any more. If we are not scrapping it off entirely then atleast please use @depracted tag here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dhaval24 - same here. Unrelated to this PR.

}
}

public void setMaxInstantRetries(int maxInstantRetries) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Java docs are still missing here!

}
}

public int getMaxInstantRetries() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here please update java docs here.

Copy link
Contributor

@dhaval24 dhaval24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks good to me. Just some minor nit-pickings

Copy link
Contributor

@grlima grlima left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change needed when reading instant retry configuration. Need to get value from "namesAndValues" map. See comment. Thanks!


public final class TransmissionPolicyManager implements Stoppable, TransmissionHandlerObserver {

private int INSTANT_RETRY_AMOUNT = 3; // Should always be set by the creator of this class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is using the java naming convention for constants. For member variables, camelCase is preferred, but I'm not going to block the PR for this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

@debugthings
Copy link
Contributor Author

@dhaval24 I did add the ability to set the max retry amount from the XML so it will be available if that number is too high or low. Outside of that, it was adding javadocs or simple refactor

@dhaval24
Copy link
Contributor

@debugthings for me it's approved :-) I do not have any additional concerns

@grlima grlima merged commit 1fad688 into microsoft:2.0.0-STABLE Feb 16, 2018
Copy link

@AlexKlimovMS AlexKlimovMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still reviewing

if (backOffMillis > 0)
{
long backOffSeconds = backOffMillis / 1000;
InternalLogger.INSTANCE.logAlways(InternalLogger.LoggingLevel.TRACE, "App is throttled, telemetry will be blocked for %s seconds.", backOffSeconds);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this call become very noisy with no option to turn off on big load?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly, I can remove the log always, The backoff is called in large increments so it would become chatty if there were multiple threads and if we were unable to send data for a large amount of time.


// List of transmission policies implemented as handlers
private List<TransmissionHandler> transmissionHandlers;

// The future date the the transmission is blocked

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleaning spaces irregularities is always good idea. Code should be clean logically and stylistically =)

*/
public void clearBackoff() {
policyState.setCurrentState(TransmissionPolicy.UNBLOCKED);
backoffManager.onDoneSending();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spaces

}
if (bufferedReader != null) {
bufferedReader.close();
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be closed in finally statement?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this if should definitely go in finally!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lines 125--130 should be wrapped in the finally block below

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, moved to finally and added try/catch around those close() statements as well.

littleaj added a commit that referenced this pull request Mar 8, 2018
* Fix null ref check in telemetry correlation Utils (#541)

* Fix null ref check in TelemetryCorrelationUtils

* Modifying log level to warning

* Updating Changelog

* Fix handling of NaN and +/-Infinity in JSON serializer (#499)

* Handle NaN and +/-Infinity in metrics

* Default NaN/Infinity serialization to 0 to be consistent with other AI SDKs and make the code compatible with Java 6

* fixed javadoc errors and added section to generate pom.xml with all builds (#551)

* Updating version number to 2.0.0

* Implementing Retry logic [Reliable Channel] [STABLE Branch] (#561)

* Initial commit of retry and backoff logic fixes

* Fixing warnings on files I touched this round

* Fix the eclipse UI from screaming about the docker Contstants

* Fixed backoff logic to use existing method. Added more logging to the sender channel.

* Added the partial response handler, more logging

* Added gson to core. Fixed backoff manager to keep original functionality. Added extension to return the timeout values as expected before.

* Added unit tests.

* Fixing string typed ArrayList<> to List<> per Dhaval

* Missed one

* Making tests consistent.

* Added javadoc comments, simplified logic for a few methods

* Added exception logging per @dhaval24. Fixed formatting on touched files

* Updates per last round of commits

Moved the Handlers out of the concrete package to the common package to keep the same consistency.  Removed a couple of unessecary methods. Added docs.

* Latest fixes

* Add MaxInstantRetry

Added MaxInstantRetry configuration to allow for instantaneous retry on a failed transmission.

* Javadoc Updates

Javadoc and formatting updates

* NumberFormatException fix

Added null check

* JavaDocs for TPM

* Fixing FixedRateSampling to work in old and new version of sampling (#540)

Overriding default sampling percentage when programatically specified sampling percentage by user.

* upgrade to logback v1.2.3 (#565)

* Reliable channel: replacing logAlways "TRACE" statements with "info" (#571)

* Reliable channel: close resources in finally block. (#572)

* Reliable channel: close resources in finally block.

* change logging to warning when closing resources

* Bugfix against retry logic (#576)

* Refactor

* BUGFIX Logic would never backoff

After adding the instant retry amount logic to the code this line of code could cause the transmissions to not back off.

* Changes requested

* Fixed javadocs tags, that caused build errors while executing `javadoc` gradle task (#578)

* Update Changelog

* Fix link in changelog

* Fix another link in changelog

* Update gradle.properties

* Fix customizing pom.xml in Gradle build (#582)

* Fix customizing pom.xml in Gradle build

* Insert license after 1. row in pom.xml

* Filter artifacts relocated by shadow task from pom dependencies

- match artifacts by groupId
- fixes #583 

* Generate a pom file "beside" the artifact jar file
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants