-
Notifications
You must be signed in to change notification settings - Fork 199
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
- Fixing forever loop if something is in illegal state. … #405
Conversation
…Eg. Web app is stopped and some code tries to access classes that is on in classpath.
Increasing version number to 1.0.10
Adding ThirdPartyNotices.txt
@@ -176,6 +176,9 @@ public boolean send(Transmission transmission) { | |||
if (ioe instanceof ConnectTimeoutException) { | |||
transmissionPolicyManager.suspendInSeconds(TransmissionPolicy.BLOCKED_BUT_CAN_BE_PERSISTED, DEFAULT_BACKOFF_TIME_SECONDS); | |||
} | |||
} catch (IllegalStateException e) { | |||
suspendTransmissions(TransmissionPolicy.BLOCKED_AND_CANNOT_BE_PERSISTED, response); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like suspendTransmission()
will exit at the very first line. In case of the IllegalStateException
there is no RetryAfter
header to read and get the time to retry from. This would mean that the behavior will be just to return here which is not different from general catch (Exception e)
block. Are you sure that this suspension fixes the issue with the forever loop?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is correct, and my changes will not work. I fixed the "forever loop bug" in 1.0.8 by adding transmissionPolicyManager.suspendInSeconds(...) I did not notice that suspendTransmissions() is very different from transmissionPolicyManager.suspendInSeconds(..) that is master branch, but I have tested and the forever loop also occure with latest code (10.07.2017)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you think replacing it with suspendInSeconds()
or just dropping the telemetry item if an IllegalStateException
happens is the right choice here to continue with this PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for late reply. I was on vacation. I think dropping the telemetry item is the right choice. The java environment is not in an appropriate state to continue. Is the best way to just shut down by calling TransmissionPolicyManager.stop()?
Do not set telemetry timestamp if it already exists
None of the other SDKs use the `APPLICATION_INSIGHTS_IKEY` environment variable to configure the instrumentation key for Application Insights and instead use `APPINSIGHTS_INSTRUMENTATIONKEY`. For consistency with the other SDKs (e.g. Dot-Net https://aka.ms/n2ruy1 or Node https://aka.ms/v8r0pk) this Java SDK should also support the more common environment variable name to avoid confusion for developers who have experience with the non-Java SDKs.
Add second environment variable to configure ikey
Fixes custom telemetry timestamp bug
Update ChangeLog
@@ -176,6 +177,9 @@ public boolean send(Transmission transmission) { | |||
if (ioe instanceof ConnectTimeoutException) { | |||
transmissionPolicyManager.suspendInSeconds(TransmissionPolicy.BLOCKED_BUT_CAN_BE_PERSISTED, DEFAULT_BACKOFF_TIME_SECONDS); | |||
} | |||
} catch (IllegalStateException e) { | |||
transmissionPolicyManager.stop(SHUTDOWN_TIME, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andresol , Is there a way this code (or even app) can get out of the illegal state with one of the next method invocations being successful?
I know I suggested stop()
myself, but, on the second thought, I now realize that it leads to all sender threads being stopped and never trying again if they are not re-initialized. So, a single IllegalStateException
will stop telemetry from the app up until the next app restart. With this, if there is a possibility that the code may recover on its own - we'd better let it try (unobtrusively of course, without infinite loop :) )
Sorry that it turned out to be a lengthy discussion with back and forth, I just do not want to jump into the conclusion with the severe side effects. Let me know your thoughts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Dmitry-Matveev i agree. It is better to let it try. DEFAULT_BACKOFF_TIME_SECONDS may be a bit long time. It would be a very "big" bug if the app can recover from a illegall state exception.
I had to fix the problem in 1.0.8 with transmissionPolicyManager.suspendInSeconds() because in just mater of minutes it filled like 60 gig of diskspace. It works very well. Not filling up my server:) I will make new changes.
…likely not recover. This will prevent forever loop.
…com/Microsoft/ApplicationInsights-Java into dhaval/RevertSetSampleRateNullCheck
…eRateNullCheck Dhaval/revert set sample rate null check
@andresol, there is another PR coming into the same area that will enable back-off for all exceptions to make the "while(true)" 100% CPU loop in case of networking issues less pronounced. In that PR all exceptions will have the same back-off. Would you like to keep 30 seconds or 300 looks OK in this case? If yes, can you please switch to `DEFAULT_BACKOFF" to simplify merge for the second PR above coming after this one? |
* Jedis Visitor Changes * Hooking up Jedis visitor * remove printstack * Update changelog * fixed typo
…g truncate method
Merge Dhdoshi/sanitize json into master
Sorry for late answer. Have been very busy. I think DEFAULT_BACKOFF is ok, but shorter back off time is in my opinion to be prefered. I have changed back too DEFAULT_BACKOFF and if we have a lot of complains about server is hanging for long time and so on we could change it. |
Delay retrying to send telemetry if an exception is hit
…Eg. Web app is stopped and some code tries to access classes that is on in classpath.
…likely not recover. This will prevent forever loop.
# Conflicts: # core/src/main/java/com/microsoft/applicationinsights/internal/channel/common/TransmissionNetworkOutput.java
@andresol , change looks good, however I see that this merge is now trying to update many files, have you pulled/merged master locally? |
yes. I merged in master branch locally. Too keep my branch up to date and too update the code to match the merge of #420. |
@andresol we would really like to merge this contribution but there seems to be too many recreated files in this PR where as change is not that big. Would it be possible for you to create a new PR from the latest master with the required set of changes so we can merge it. Let me know if that is not possible because the only fear here is that it can affect the linear flow of commits on the repository. |
@@ -295,9 +296,11 @@ public void write(String name, String value) throws IOException { | |||
} | |||
|
|||
private void writeName(String name) throws IOException { | |||
String truncatedString = truncate(name, 150); | |||
String sanitizedName = SanitizationUtils.sanitizeStringForJSON(truncatedString, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend to look at existing libraries for json processing, such as Jackson where you can use ObjectMapper
to write string/object as json.
#501 addresses this issue |
E.g. Web app is stopped and some code tries to access classes that is removed from class path.