Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition #5359

Merged
merged 8 commits into from
Oct 17, 2023
Merged

Race condition #5359

merged 8 commits into from
Oct 17, 2023

Conversation

dtbaum
Copy link
Contributor

@dtbaum dtbaum commented Jul 7, 2023

Fixed race condition, see:
#5358

Fixed race condition, see:
 eclipse-ee4j#5358
@dtbaum dtbaum mentioned this pull request Jul 7, 2023
@jansupol
Copy link
Contributor

@dtbaum Can you please elaborate on how the race condition happens?

@dtbaum
Copy link
Contributor Author

dtbaum commented Jul 13, 2023

@jansupol The race condition occurs when multiple independent Jersey client instances, running in parallel threads, make a GET request.

When the first GET request is in progress, all parallel requests from other Jersey clients (other client instances, but same SSLContext and same target), in other threads, fail with SSLHandshakeException: PKIX path building failed.
The first caller uses the correct trust store (which is defined in used SSLContext), the parallel callers use the java default trusstore.. The problem is not working if (DEFAULT_SSL_SOCKET_FACTORY.get() == suc.getSSLSocketFactory()) condition. For subsequent requests it returns suc.getSSLSocketFactory() default factory, BUT another object reference to default factory, therefore the reference equality doesn't work here. A deep equality check, like working .equals() or removing the condition would solve it.
Once the first GET request is completed, all subsequent requests work without error.
Here is the code which demonstrates this issue:
https://github.com/dtbaum/jersey-bug-report

If you get any issues getting the example to run, please let me know. I would add it as a broken JUnit test on this jersey branch.

@dtbaum
Copy link
Contributor Author

dtbaum commented Jul 20, 2023

@jansupol: Did you succeed in reproducing the problem?
The reason for the exception is following: Due to mentioned reference equality check, the "parallel" threads do not use the specified SSLContext, but the default SSLContext instead.
Consequently, the server certificate validation occurs not against the specified truststore, but against the default truststore, leading to the mentioned exception.

@jansupol
Copy link
Contributor

Thank you for the reproducer @dtbaum. The fix breaks SslHttpUrlConnectorTest.testSSLWithCustomSocketFactory, but we'll check if there is any possible fix for it.

@jansupol
Copy link
Contributor

Related #4815

This bug has a long history. Basically, there is a bug in JDK that we try to workaround, but there seem to be multiple requirements that go against each other.

@dtbaum
Copy link
Contributor Author

dtbaum commented Jul 27, 2023

@jansupol: As much as I understand (which isn't much :-) ) a deep, reliable equality check, like working .equals() instead of object reference equality, would help, without violating other requirements?

@dtbaum dtbaum closed this Jul 27, 2023
@dtbaum dtbaum reopened this Jul 27, 2023
@dtbaum dtbaum changed the title Fixed race condition Race condition Jul 31, 2023
jenkins-daemon pushed a commit to hortonworks/cloudbreak that referenced this pull request Sep 8, 2023
…cketFactory is created in jersey client. The same issue is opened: eclipse-ee4j/jersey#5359 . patching the HttpUrlConnector may help the problem. Modifying the build only in integration test to use this patch
@dtbaum
Copy link
Contributor Author

dtbaum commented Oct 5, 2023

@jansupol: I think, I have a fix for the race condition, without violating #4757/#4566.
I have made the method HttpUrlConnectorProvider.ConnectionFactory.getConnection() thread-safe:
This ensures that parallel initialization in the HttpsURLConnection.getDefaultSSLSocketFactory() is avoided.
See the screenshot below for call hierarchy:
image

@jansupol jansupol linked an issue Oct 11, 2023 that may be closed by this pull request
srowen pushed a commit to apache/spark that referenced this pull request Oct 29, 2023
### What changes were proposed in this pull request?
This pr aims upgrade Jersey from 2.40 to 2.41.

### Why are the changes needed?
The new version bring some improvements, like:
- eclipse-ee4j/jersey#5350
- eclipse-ee4j/jersey#5365
- eclipse-ee4j/jersey#5436
- eclipse-ee4j/jersey#5296

and some bug fix, like:
- eclipse-ee4j/jersey#5359
- eclipse-ee4j/jersey#5405
- eclipse-ee4j/jersey#5423
- eclipse-ee4j/jersey#5435
- eclipse-ee4j/jersey#5445

The full release notes as follows:
- https://github.com/eclipse-ee4j/jersey/releases/tag/2.41

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #43490 from LuciferYang/SPARK-45636.

Lead-authored-by: YangJie <yangjie01@baidu.com>
Co-authored-by: yangjie01 <yangjie01@baidu.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
@fwippe
Copy link

fwippe commented Sep 4, 2024

@senivam @jansupol @dtbaum Please note that this change has been having a drastic impact on performance, in particular when using Jersey client for multiple services: Only one connection can be created at a time, across all routes. If the intention had been to just synchronize HttpsURLConnection.getDefaultSSLSocketFactory(), as an alternative you could have introduced a lazy one-off supplier initializing the socket factory. See #5738 for performance impact in practice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition
4 participants