-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-29923][SQL][TESTS] Set io.netty.tryReflectionSetAccessible for Arrow on JDK9+ #26552
Conversation
cc @BryanCutler |
@dongjoon-hyun I'm not too familiar with Netty and this issue, but it did come about in https://issues.apache.org/jira/browse/ARROW-3191 for Arrow 0.14.0. Is this something that will affect consumers of Spark jars? I actually thought this was defined in the Arrow POM, but looks like it's not possibly due to this comment apache/arrow#4522 (comment). cc @emkornfield who might have some more insight. |
Thank you for comments, @BryanCutler . I'm also very surprised because this happens suddenly.
If we don't have any proper follow-up fix for this, I need to revert your commit about Arrow 0.15.1 because our Jenkins is broken. I'm trying to find a way to avoid that. |
Test build #113901 has finished for PR 26552 at commit
|
I am not sure this change in the PR actually sets the system property for tests. I think it needs to be set in the surefire plugin config. Like in |
Yes. This is a headache for Arrow users. I'll switch to |
If that works, I'd ask, why not default to allowing this access in Arrow? CCing @BryanCutler as an expert. |
Yes. Right.
And, this means they don't want to enable this by default in code-side. I don't know the reason either. |
Retest this please. |
Retest this please. |
Oh, @srowen . Do you mean enabling this in our code base (somewhere arrow-related code)? |
Test build #113903 has finished for PR 26552 at commit
|
Test build #113904 has finished for PR 26552 at commit
|
Thanks for helping out with this @dongjoon-hyun and @srowen. I think this should be taken care of in the Arrow code-base since by default it causes a failure, I'm not sure why it wasn't but I'll followup with the Arrow community about it. In the meantime, I understand if we need to revert the upgrade from #26133 but we should also downgrade the Jenkins pyarrow at the same time or else it will fail too. It looks like the last test failed also. I don't know where the proper place to put this is, but I'll try to run some tests also. |
Test build #113906 has finished for PR 26552 at commit
|
No worry, @BryanCutler . There is a progress. It passed all Scala/Java UTs. Now, it seems that we need to add this conf to PySpark/SparkR UT. |
cc @HyukjinKwon |
For pyspark, the patch below seems to work, but it would only fix unit tests and not all usage diff --git a/python/run-tests.py b/python/run-tests.py
index 5bcf8b0..ecf5859 100755
--- a/python/run-tests.py
+++ b/python/run-tests.py
@@ -87,6 +87,7 @@ def run_individual_python_test(target_dir, test_name, pyspark_python):
# Also override the JVM's temp directory by setting driver and executor options.
spark_args = [
+ "--conf", "spark.driver.extraJavaOptions=-Dio.netty.tryReflectionSetAccessible=true",
"--conf", "spark.driver.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir),
"--conf", "spark.executor.extraJavaOptions=-Djava.io.tmpdir={0}".format(tmp_dir),
"pyspark-shell" |
Thank you! |
Ur, do we have two |
For PySpark, I created another PR to do faster iteration. For SparkR, I created another PR for the same reason. I'll merge them all together into this PR later. |
Test build #113911 has finished for PR 26552 at commit
|
I'll merge this to recover JDK11 Jenkins job. |
Test build #113905 has finished for PR 26552 at commit
|
Test build #113912 has finished for PR 26552 at commit
|
We can remove this later after Apache Arrow has the proper logic inside itself. |
OK, at least we need to mark this as a release-notes item, until it's undone. BTW this isn't really an Arrow behavior, but a Netty one: We can try to set this system property at startup in Spark, though I don't know if we can do it early enough. I don't know if Arrow can or wants to try to set this, likewise? like if not set, set to true? This isn't ideal for JDK 9+ users as several things won't work unless their deployment also sets this value - and same could be true of any Arrow user. See also https://gitter.im/netty/netty?at=5ce3e8d513e9854e334005d7 which suggests using a different allocator avoids this, but I don't know if it's directly helpful. I think the issue is that Arrow is reaching into its PlatformDependent class directly. |
Yes. I created SPARK-29924 Document Arrow requirement in JDK9+ for that. Since telease note might be insufficient, I created the issue yesterday. |
For turning on the flag in Arrow we can discuss it again. When this issue came up the first time, it seemed like it shouldn't be Arrow's job to force this, but I'm happy to reconsider and I suppose we can come up with a mechanism to allow end users to opt out if for some reason there is a need to turn it off (we might also be able to investigate moving away from the components that require this). |
Thank you, @emkornfield ! |
Thanks @emkornfield , I'll start a thread in the Arrow dev list and we can discuss there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my late response. LGTM for now.
### What changes were proposed in this pull request? This adds a note for additional setting for Apache Arrow library for Java 11. ### Why are the changes needed? Since Apache Arrow 0.14.0, an additional setting is required for Java 9+. - https://issues.apache.org/jira/browse/ARROW-3191 It's explicitly documented at Apache Arrow 0.15.0. - https://issues.apache.org/jira/browse/ARROW-6206 However, there is no plan to handle that inside Apache Arrow side. - https://issues.apache.org/jira/browse/ARROW-7223 In short, we need to document this for the users who is using Arrow-related feature on JDK11. For dev environment, we handle this via [SPARK-29923](#26552) . ### Does this PR introduce any user-facing change? Yes. ### How was this patch tested? Generated document and see the pages. ![doc](https://user-images.githubusercontent.com/9700541/73096611-0f409d80-3e9a-11ea-804b-c6b5ec7bd78d.png) Closes #27356 from dongjoon-hyun/SPARK-JDK11-ARROW-DOC. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
This PR aims to add
io.netty.tryReflectionSetAccessible=true
to the testing configuration for JDK11 because this is an officially documented requirement of Apache Arrow.Apache Arrow community documented this requirement at
0.15.0
(ARROW-6206).Why are the changes needed?
After ARROW-3191, Arrow Java library requires the property
io.netty.tryReflectionSetAccessible
to be set to true for JDK >= 9. After #26133, JDK11 Jenkins job seem to fail.Does this PR introduce any user-facing change?
No.
How was this patch tested?
Pass the Jenkins with JDK11.