Resolve spinlock issue in AcceptorExecutor thread #1303
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I ran into an issue where an AcceptorExecutor thread would not shutdown and was using 100% cpu. This thread shutdown will happen when a PeerEurekaNode is removed from the list of nodes.
Here is a snippet of the thread dump:
You can see in the heap dump screenshot below that shutdown was properly called and received by the thread. Also, you will notice that all 3 queues (reprocessQueue, acceptorQueue and pendingTasks) are empty, causing the drainInputQueues while loop to get stuck.
If you were to be unlucky, when shutdown was called, you could fall into a condition where nothing was added to the pendingTasks queue and both reprocess and acceptor queue would stay empty since the executor was shutdown.
I did a quick and dirty unit test locally to reproduce the issue, but since this is a timing issue, the test would sometime pass without the fix in position. Of course, when the fix was there I was not able to make the test fail. Because of that, I decided not to add the unit test since it was not very robust. Let me know if you have an idea on how to elegantly test this! Thanks!