Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

- Fixing forever loop if something is in illegal state. … #405

Closed
wants to merge 42 commits into from

Conversation

andresol
Copy link

@andresol andresol commented Aug 7, 2017

E.g. Web app is stopped and some code tries to access classes that is removed from class path.

…Eg. Web app is stopped and some code tries to access classes that is on in classpath.
@@ -176,6 +176,9 @@ public boolean send(Transmission transmission) {
if (ioe instanceof ConnectTimeoutException) {
transmissionPolicyManager.suspendInSeconds(TransmissionPolicy.BLOCKED_BUT_CAN_BE_PERSISTED, DEFAULT_BACKOFF_TIME_SECONDS);
}
} catch (IllegalStateException e) {
suspendTransmissions(TransmissionPolicy.BLOCKED_AND_CANNOT_BE_PERSISTED, response);
Copy link
Member

@Dmitry-Matveev Dmitry-Matveev Aug 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like suspendTransmission() will exit at the very first line. In case of the IllegalStateException there is no RetryAfter header to read and get the time to retry from. This would mean that the behavior will be just to return here which is not different from general catch (Exception e) block. Are you sure that this suspension fixes the issue with the forever loop?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct, and my changes will not work. I fixed the "forever loop bug" in 1.0.8 by adding transmissionPolicyManager.suspendInSeconds(...) I did not notice that suspendTransmissions() is very different from transmissionPolicyManager.suspendInSeconds(..) that is master branch, but I have tested and the forever loop also occure with latest code (10.07.2017)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you think replacing it with suspendInSeconds() or just dropping the telemetry item if an IllegalStateException happens is the right choice here to continue with this PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for late reply. I was on vacation. I think dropping the telemetry item is the right choice. The java environment is not in an appropriate state to continue. Is the best way to just shut down by calling TransmissionPolicyManager.stop()?

adnanyaqoobvirk and others added 10 commits August 19, 2017 18:57
Do not set telemetry timestamp if it already exists
None of the other SDKs use the `APPLICATION_INSIGHTS_IKEY` environment
variable to configure the instrumentation key for Application Insights
and instead use `APPINSIGHTS_INSTRUMENTATIONKEY`.

For consistency with the other SDKs (e.g. Dot-Net https://aka.ms/n2ruy1
or Node https://aka.ms/v8r0pk) this Java SDK should also support the
more common environment variable name to avoid confusion for developers
who have experience with the non-Java SDKs.
Add second environment variable to configure ikey
Fixes custom telemetry timestamp bug
@@ -176,6 +177,9 @@ public boolean send(Transmission transmission) {
if (ioe instanceof ConnectTimeoutException) {
transmissionPolicyManager.suspendInSeconds(TransmissionPolicy.BLOCKED_BUT_CAN_BE_PERSISTED, DEFAULT_BACKOFF_TIME_SECONDS);
}
} catch (IllegalStateException e) {
transmissionPolicyManager.stop(SHUTDOWN_TIME, TimeUnit.SECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andresol , Is there a way this code (or even app) can get out of the illegal state with one of the next method invocations being successful?

I know I suggested stop() myself, but, on the second thought, I now realize that it leads to all sender threads being stopped and never trying again if they are not re-initialized. So, a single IllegalStateException will stop telemetry from the app up until the next app restart. With this, if there is a possibility that the code may recover on its own - we'd better let it try (unobtrusively of course, without infinite loop :) )

Sorry that it turned out to be a lengthy discussion with back and forth, I just do not want to jump into the conclusion with the severe side effects. Let me know your thoughts.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dmitry-Matveev i agree. It is better to let it try. DEFAULT_BACKOFF_TIME_SECONDS may be a bit long time. It would be a very "big" bug if the app can recover from a illegall state exception.

I had to fix the problem in 1.0.8 with transmissionPolicyManager.suspendInSeconds() because in just mater of minutes it filled like 60 gig of diskspace. It works very well. Not filling up my server:) I will make new changes.

@Dmitry-Matveev
Copy link
Member

@andresol, there is another PR coming into the same area that will enable back-off for all exceptions to make the "while(true)" 100% CPU loop in case of networking issues less pronounced. In that PR all exceptions will have the same back-off. Would you like to keep 30 seconds or 300 looks OK in this case? If yes, can you please switch to `DEFAULT_BACKOFF" to simplify merge for the second PR above coming after this one?

@andresol
Copy link
Author

Sorry for late answer. Have been very busy. I think DEFAULT_BACKOFF is ok, but shorter back off time is in my opinion to be prefered. I have changed back too DEFAULT_BACKOFF and if we have a lot of complains about server is hanging for long time and so on we could change it.

leantk and others added 6 commits September 15, 2017 09:42
Delay retrying to send telemetry if an exception is hit
…Eg. Web app is stopped and some code tries to access classes that is on in classpath.
…likely not recover. This will prevent forever loop.
# Conflicts:
#	core/src/main/java/com/microsoft/applicationinsights/internal/channel/common/TransmissionNetworkOutput.java
@Dmitry-Matveev
Copy link
Member

@andresol , change looks good, however I see that this merge is now trying to update many files, have you pulled/merged master locally?

@andresol
Copy link
Author

yes. I merged in master branch locally. Too keep my branch up to date and too update the code to match the merge of #420.

@microsoft microsoft deleted a comment from msftclas Sep 26, 2017
@microsoft microsoft deleted a comment from msftclas Sep 26, 2017
@dhaval24
Copy link
Contributor

@andresol we would really like to merge this contribution but there seems to be too many recreated files in this PR where as change is not that big. Would it be possible for you to create a new PR from the latest master with the required set of changes so we can merge it. Let me know if that is not possible because the only fear here is that it can affect the linear flow of commits on the repository.

@@ -295,9 +296,11 @@ public void write(String name, String value) throws IOException {
}

private void writeName(String name) throws IOException {
String truncatedString = truncate(name, 150);
String sanitizedName = SanitizationUtils.sanitizeStringForJSON(truncatedString, true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend to look at existing libraries for json processing, such as Jackson where you can use ObjectMapper to write string/object as json.

@littleaj
Copy link
Contributor

#501 addresses this issue

@littleaj littleaj closed this Jan 10, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.