-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
10s latency introduced when no external network connectivity #5295
Comments
possibly caused by adding this? maybe dns resolution that is timing out |
@necrolyte2 can you capture a thread dump during the 10s pause so we can see where the blocking is occurring? |
@trask How do we get a thread dump from java? Sorry, I know very little about the Java ecosystem. The jvm is running within a docker container inside of a VM so not entirely sure how to attach and get the thread dump We were able to simply do a dns call on the VM running the container where the app is running and were seeing a 5s timeout Wifi Up
Wifi Down
It really feels like it is being caused by some sort of DNS lookup which I think @dgund14 found some code introduced in 1.7.0. These InetAddress lookup calls are tricky because they rely on the implementation of the host that is running the code to determine how lookups are done. From my Mac laptop, I get an immediate response of UnknownHostException, where in the container it does not receive an exception, but instead just waits until it gets a response from dns. |
hey @necrolyte2, I don't believe the code link above points to a method that performs any DNS resolution. |
It looks like getHostName() can trigger a reverse lookup which is used on line 33 of that file. |
@dgund14 thx! indeed that looks problematic, I will push a PR to experiment with removing that |
Here is the output from jstack while the 10s timeout is occurring
|
@necrolyte2 thx! unfortunately I'm a little confused by where the stack trace is pointing to:
because just want to confirm is this result consistent? (the stack trace during the pause points to this location, and you don't see the slowness when the javaagent is not attached?) |
It doesn't occur with any version below 1.7.0 of the agent |
Can you capture another thread dump to see if you happen to get any other locations? The |
Previous dump was while running the agent with the changes from your draft PR.
|
oh, this is good, so maybe the PR fixes the issue? or is there still a difference with/without the patched agent? |
Using these changes(which I was lazy so just made them all return null) seems to fix the issue. I'll quick use your code changes to verify that PR as well
Both of these are performed in the scenario of not having connectivity to the upstream dns server, but with the code in your draft PR
Same test, but with code from main branch
|
Looks good!
|
awesome, thx for the help @necrolyte2 @dgund14!! |
Just as a follow-up, I ran the same tests with the 1.5.2 agent just to compare which produced the same results as the PR which is good to see(as further validation).
|
Describe the bug
There appears to be some sort of blocking occurring when there is no connection available for the application(disconnect WIFI...)
Steps to reproduce
REPRO
I tested using the above with versions 1.5.2, 1.6.2, 1.7.0, 1.7.2, 1.9.2, and 1.10.1
1.5.2 and 1.6.2 did not have the issue, but all other versions >= 1.7.0 did
What did you expect to see?
Calling the endpoint should take less than 100ms
What did you see instead?
A 10s latency is introduced
What version are you using?
Any version >= 1.7.0
Environment
Compiler: "AdoptOpenJDK 11
Additional context
The environment where our apps run have scenarios where they lose connectivity to the internet frequently which is where this bug was uncovered
The text was updated successfully, but these errors were encountered: