Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resolve spinlock issue in AcceptorExecutor thread #1303

Merged
merged 1 commit into from
Jun 10, 2020
Merged

Resolve spinlock issue in AcceptorExecutor thread #1303

merged 1 commit into from
Jun 10, 2020

Conversation

larosek
Copy link
Contributor

@larosek larosek commented Jun 9, 2020

I ran into an issue where an AcceptorExecutor thread would not shutdown and was using 100% cpu. This thread shutdown will happen when a PeerEurekaNode is removed from the list of nodes.

Here is a snippet of the thread dump:

"TaskAcceptor-target_ec2-xx-xx-xx-xxx.compute-1.amazonaws.com" - Thread t@47
   java.lang.Thread.State: RUNNABLE
	at com.netflix.eureka.util.batcher.AcceptorExecutor$AcceptorRunner.drainInputQueues(AcceptorExecutor.java:234)
	at com.netflix.eureka.util.batcher.AcceptorExecutor$AcceptorRunner.run(AcceptorExecutor.java:189)
	at java.base@11.0.7/java.lang.Thread.run(Thread.java:834)
   Locked ownable synchronizers:
	- None

You can see in the heap dump screenshot below that shutdown was properly called and received by the thread. Also, you will notice that all 3 queues (reprocessQueue, acceptorQueue and pendingTasks) are empty, causing the drainInputQueues while loop to get stuck.

heapdump

If you were to be unlucky, when shutdown was called, you could fall into a condition where nothing was added to the pendingTasks queue and both reprocess and acceptor queue would stay empty since the executor was shutdown.

I did a quick and dirty unit test locally to reproduce the issue, but since this is a timing issue, the test would sometime pass without the fix in position. Of course, when the fix was there I was not able to make the test fail. Because of that, I decided not to add the unit test since it was not very robust. Let me know if you have an idea on how to elegantly test this! Thanks!

@troshko111
Copy link
Contributor

troshko111 commented Jun 10, 2020

Yeah the way it is today is a while(true) loop which does not have an exit condition when the shutdown is set but pending is empty, so the outer loop never gets a chance to check and exit. Thanks for the PR!

@troshko111 troshko111 merged commit 6d7a1e1 into Netflix:master Jun 10, 2020
@larosek larosek deleted the fix/spinlock-issue branch June 10, 2020 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants