Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition where the HystrixMetricsPublisherThreadPool isn't referencing the correct ThreadPoolExecutor instance #270

Conversation

chrisgray
Copy link
Contributor

Race condition exists when constructing a HystrixThreadPool where the ThreadPoolExecutor used by HystrixThreadPool.Factory is not the same instance as the one that is associated with the HystrixMetricsPublisherThreadPool.

This causes a disconnect between the ThreadPoolExecutor's metrics and those supplied by HystrixMetricsThreadPool.

Now the ThreadPoolExecutor itself is retrieved from the HystrixMetricsThreadPool object, because it is protected behind the pattern of a ConcurrentMap#putIfAbsent. This seemed much more difficult to do with the ConcurrencyStrategy
as then any further custom implementations would not have similar concurrency guarantees.

A test has been added to show that the ThreadPoolExecutor construct is indeed correct and the same as the one associated with the metrics publisher. It was a bit difficult to construct a test to prove the bad case and good case at the same time. Once the solution was in place, I had to turn the failing test case into a validation test case.

In addition this exposed some static state that was kept around between some tests that needed to be cleaned up. In particular the HystrixMetricsPublisherFactory has a singleton object that needed to be reset (even though it's a singleton) in order to validate the test. I'm not particularly happy with this approach, so I'd be happy for any help here.

… ThreadPoolExecutor used by HystrixThreadPool.Factory is not the same instance as the one that is associated with the HystrixMetricsPublisherThreadPool.

This causes a disconnect between the ThreadPoolExecutor's metrics and those supplied by HystrixMetricsThreadPool.

Now the ThreadPoolExecutor itself is retrieved from the HystrixMetricsThreadPool object, because it is protected behind the pattern of a ConcurrentMap#putIfAbsent. This seemed much more difficult to do with the ConcurrencyStrategy
as then any further custom implementations would not have similar concurrency guarantees.

A test has been added to show that the ThreadPoolExecutor construct is indeed correct and the same as the one associated with the metrics publisher. It was a bit difficult to construct a test to prove the bad case and good case at the same time. Once the solution was in place, I had to turn the failing test case into a validation test case.

In addition this exposed some static state that was kept around between some tests that needed to be cleaned up. In particular the HystrixMetricsPublisherFactory has a singleton object that needed to be reset (even though it's a singleton) in order to validate the test. I'm not particularly happy with this approach, so I'd be happy for any help here.
@cloudbees-pull-request-builder

Hystrix-pull-requests #137 SUCCESS
This pull request looks good

@benjchristensen
Copy link
Contributor

Thank you @chrisgray for finding this and submitting a fix. This is why we're still at Release Candidate status so I'm very appreciative of you spending time to help on this.

I know of at least 2 more things I need to look at before 1.4.0 leaves RC.

It will be a few more days before I have time in my schedule to thoroughly review this so I'm going to leave it open for now, but will definitely come back to this once I can.

@RyanAD
Copy link

RyanAD commented Jul 11, 2014

We are using Hystrix and have run into this issue as well. I wrote a test for this which fails when using master but passes as soon as I merge this pull request into my local fork. Thanks @chrisgray

@benjchristensen
Copy link
Contributor

Getting into this again finally ... and cc @mattrjacobs who is helping out.

@mattrjacobs
Copy link
Contributor

Thanks for the contribution, @chrisgray. I reviewed and manually merged via #361

chrisgray pushed a commit to yammer/tenacity that referenced this pull request Apr 8, 2015
…ategy because the Netflix/Hystrix#270 was merged into 1.4.0+ version of Hystrix.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants