Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48021][ML][BUILD] Add --add-modules=jdk.incubator.vector to JavaModuleOptions #46246

Closed
wants to merge 3 commits into from

Conversation

panbingkun
Copy link
Contributor

@panbingkun panbingkun commented Apr 26, 2024

What changes were proposed in this pull request?

The pr aims to:

  • add --add-modules=jdk.incubator.vector to JavaModuleOptions
  • remove jdk.incubator.foreign and -Dforeign.restricted=warn from SparkBuild.scala

Why are the changes needed?

1.jdk.incubator.vector
First introduction: #30810
https://github.com/apache/spark/pull/30810/files#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3e
image

Why should we add --add-modules=jdk.incubator.vector to JavaModuleOptions,
Because when we only add --add-modules=jdk.incubator.vector to SparkBuild.scala, it will only take effect when compiling, as follows:

build/sbt "mllib-local/Test/runMain org.apache.spark.ml.linalg.BLASBenchmark"
...
image

However, when we use spark-submit, it is as follows:

./bin/spark-submit --class org.apache.spark.ml.linalg.BLASBenchmark /Users/panbingkun/Developer/spark/spark-community/mllib-local/target/scala-2.13/spark-mllib-local_2.13-4.0.0-SNAPSHOT-tests.jar
image

Obviously, --add-modules=jdk.incubator.vector does not take effect in the Spark runtime, so I propose adding --add-modules=jdk.incubator.vector to the JavaModuleOptions(Spark runtime options) so that we can improve performance by using hardware-accelerated BLAS operations by default.

After this patch(add --add-modules=jdk.incubator.vector to the JavaModuleOptions), as follows:
image

2.jdk.incubator.foreign and -Dforeign.restricted=warn
A.First introduction: #32253
https://github.com/apache/spark/pull/32253/files#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3e
image
Use dev.ludovic.netlib:blas:1.3.2, the class ForeignLinkerBLAS uses jdk.incubator.foreign.* in this version, so we need to add jdk.incubator.foreign and -Dforeign.restricted=warn to SparkBuild.scala
https://github.com/apache/spark/pull/32253/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8
image
https://github.com/luhenry/netlib/blob/v1.3.2/blas/src/main/java/dev/ludovic/netlib/blas/ForeignLinkerBLAS.java#L36
image

However, with the iterative development of dev.ludovic.netlib, ForeignLinkerBLAS has experienced one major change, as following:
luhenry/netlib@48e923c
image
From now on (V3.0.0), jdk.incubator.foreign.* will not be used in dev.ludovic.netlib

Currently, Spark has used the dev.ludovic.netlib of version v3.0.3. In this version, ForeignLinkerBLAS has be removed.
https://github.com/apache/spark/blob/master/pom.xml#L191

Double check (jdk.incubator.foreign cannot be found in the netlib source code):
image

So we can completely remove options jdk.incubator.foreign and -Dforeign.restricted=warn.

B.For JDK 21
(PS: This is to explain the historical reasons for the differences between the current code logic and the initial ones)
(Just because Spark made changes to support JDK 21)
https://issues.apache.org/jira/browse/SPARK-44088
image

Does this PR introduce any user-facing change?

No.

How was this patch tested?

  • Manually test
  • Pass GA.

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the BUILD label Apr 26, 2024
@panbingkun panbingkun changed the title [Only Test] [SPARK-48021][ML][BUILD] Add --add-modules=jdk.incubator.vector to JavaModuleOptions Apr 27, 2024
@panbingkun panbingkun marked this pull request as ready for review April 27, 2024 14:14
@panbingkun
Copy link
Contributor Author

@srowen
Copy link
Member

srowen commented Apr 27, 2024

Before this flag was gated on Java 21 - it's OK to set this on earlier versions? OK if so

@panbingkun
Copy link
Contributor Author

panbingkun commented Apr 27, 2024

Before this flag was gated on Java 21 - it's OK to set this on earlier versions? OK if so

Yes, the JDK version of the above manual test environment (local) is 17.
image

Currently, the version supported by Spark Master is JDK 17 & JDK 21

@luhenry
Copy link
Contributor

luhenry commented Apr 27, 2024

Thank you for looking into that! Let me know what I should do to update dev.ludovic.netlib further for the needs of Spark

@panbingkun
Copy link
Contributor Author

panbingkun commented Apr 27, 2024

Thank you for looking into that! Let me know what I should do to update dev.ludovic.netlib further for the needs of Spark

Thank all for providing detailed descriptions during the previous PR submission and review process. That's why I can easily analyze and track the details of history ❤️

@zhengruifeng
Copy link
Contributor

also cc @WeichenXu123

Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM.

@dongjoon-hyun
Copy link
Member

Merged to master. Thank you, @panbingkun and all!

@LuciferYang
Copy link
Contributor

LuciferYang commented Apr 28, 2024

@panbingkun we should add --add-modules=jdk.incubator.vector to extraJavaTestArgs in pom.xml too

spark/pom.xml

Lines 305 to 323 in 64d3219

<extraJavaTestArgs>
-XX:+IgnoreUnrecognizedVMOptions
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
--add-opens=java.base/sun.security.action=ALL-UNNAMED
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
-Djdk.reflect.useDirectMethodHandle=false
-Dio.netty.tryReflectionSetAccessible=true
</extraJavaTestArgs>

@panbingkun
Copy link
Contributor Author

@panbingkun we should add --add-modules=jdk.incubator.vector to extraJavaTestArgs in pom.xml too

spark/pom.xml

Lines 305 to 323 in 64d3219

<extraJavaTestArgs>
-XX:+IgnoreUnrecognizedVMOptions
--add-opens=java.base/java.lang=ALL-UNNAMED
--add-opens=java.base/java.lang.invoke=ALL-UNNAMED
--add-opens=java.base/java.lang.reflect=ALL-UNNAMED
--add-opens=java.base/java.io=ALL-UNNAMED
--add-opens=java.base/java.net=ALL-UNNAMED
--add-opens=java.base/java.nio=ALL-UNNAMED
--add-opens=java.base/java.util=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent=ALL-UNNAMED
--add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
--add-opens=java.base/jdk.internal.ref=ALL-UNNAMED
--add-opens=java.base/sun.nio.ch=ALL-UNNAMED
--add-opens=java.base/sun.nio.cs=ALL-UNNAMED
--add-opens=java.base/sun.security.action=ALL-UNNAMED
--add-opens=java.base/sun.util.calendar=ALL-UNNAMED
-Djdk.reflect.useDirectMethodHandle=false
-Dio.netty.tryReflectionSetAccessible=true
</extraJavaTestArgs>

Okay, let me do it.

LuciferYang pushed a commit that referenced this pull request Apr 28, 2024
…ector` to maven compile args

### What changes were proposed in this pull request?
The pr is following up #46246
The pr aims to  add `--add-modules=jdk.incubator.vector` to maven `compile args`.

### Why are the changes needed?
As commented by LuciferYang , we need to be consistent in `maven` compile.
#46246 (comment)
<img width="907" alt="image" src="https://github.com/apache/spark/assets/15246973/26163da2-f27d-4ec2-893f-d9282b68aec1">

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes #46259 from panbingkun/SPARK-48021.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…`JavaModuleOptions`

### What changes were proposed in this pull request?
The pr aims to:
- add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`
- remove `jdk.incubator.foreign` and `-Dforeign.restricted=warn` from `SparkBuild.scala`

### Why are the changes needed?
1.`jdk.incubator.vector`
First introduction: apache#30810
https://github.com/apache/spark/pull/30810/files#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3e
<img width="1045" alt="image" src="https://github.com/apache/spark/assets/15246973/6ac7919a-5d82-475c-b8a2-7d9de71acacc">

Why should we add `--add-modules=jdk.incubator.vector` to `JavaModuleOptions`,
Because when we only add `--add-modules=jdk.incubator.vector` to `SparkBuild.scala`, it will only take effect when compiling, as follows:
```
build/sbt "mllib-local/Test/runMain org.apache.spark.ml.linalg.BLASBenchmark"
...
```
<img width="619" alt="image" src="https://github.com/apache/spark/assets/15246973/54d5f55f-cefe-4126-b255-69488f8699a6">

However, when we use `spark-submit`, it is as follows:
```
./bin/spark-submit --class org.apache.spark.ml.linalg.BLASBenchmark /Users/panbingkun/Developer/spark/spark-community/mllib-local/target/scala-2.13/spark-mllib-local_2.13-4.0.0-SNAPSHOT-tests.jar
```
<img width="1399" alt="image" src="https://github.com/apache/spark/assets/15246973/8e02fa93-fef4-4cdc-96bd-908b3e9baea1">

Obviously, `--add-modules=jdk.incubator.vector` does not take effect in the `Spark runtime`, so I propose adding `--add-modules=jdk.incubator.vector` to the `JavaModuleOptions`(`Spark runtime options`) so that we can improve `performance` by using `hardware-accelerated BLAS operations` by default.

After this patch(add `--add-modules=jdk.incubator.vector` to the `JavaModuleOptions`), as follows:
<img width="1399" alt="image" src="https://github.com/apache/spark/assets/15246973/da7aa494-0d3c-4c60-9991-e7cd29a1cec5">

2.`jdk.incubator.foreign` and `-Dforeign.restricted=warn`
A.First introduction: apache#32253
https://github.com/apache/spark/pull/32253/files#diff-6f545c33f2fcc975200bf208c900a600a593ce6b170180f81e2f93b3efb6cb3e
<img width="1041" alt="image" src="https://github.com/apache/spark/assets/15246973/3f526019-c389-4e60-ab2a-7777f8e99cfb">
Use `dev.ludovic.netlib:blas:1.3.2`, the class `ForeignLinkerBLAS` uses `jdk.incubator.foreign.*` in this version, so we need to add `jdk.incubator.foreign` and `-Dforeign.restricted=warn` to `SparkBuild.scala`
https://github.com/apache/spark/pull/32253/files#diff-9c5fb3d1b7e3b0f54bc5c4182965c4fe1f9023d449017cece3005d3f90e8e4d8
<img width="497" alt="image" src="https://github.com/apache/spark/assets/15246973/4fd35e96-0da2-4456-a3f6-6b57ad2e9b64">
https://github.com/luhenry/netlib/blob/v1.3.2/blas/src/main/java/dev/ludovic/netlib/blas/ForeignLinkerBLAS.java#L36
<img width="743" alt="image" src="https://github.com/apache/spark/assets/15246973/4b7e3bd1-4650-4c7d-bdb4-c1761d48d478">

However, with the iterative development of `dev.ludovic.netlib`, `ForeignLinkerBLAS` has experienced one `major` change, as following:
luhenry/netlib@48e923c
<img width="452" alt="image" src="https://github.com/apache/spark/assets/15246973/7ba30b19-00c7-4cc4-bea7-a6ab4b326ad8">
From now on (V3.0.0), `jdk.incubator.foreign.*` will not be used in `dev.ludovic.netlib`

Currently, Spark has used the `dev.ludovic.netlib` of version `v3.0.3`. In this version, `ForeignLinkerBLAS` has be removed.
https://github.com/apache/spark/blob/master/pom.xml#L191

Double check (`jdk.incubator.foreign` cannot be found in the `netlib` source code):
<img width="674" alt="image" src="https://github.com/apache/spark/assets/15246973/5c6c6d73-6a5d-427a-9fb4-f626f02335ca">

So we can completely remove options `jdk.incubator.foreign` and `-Dforeign.restricted=warn`.

B.For JDK 21
(PS: This is to explain the historical reasons for the differences between the current code logic and the initial ones)
(Just because `Spark` made changes to support `JDK 21`)
https://issues.apache.org/jira/browse/SPARK-44088
<img width="1350" alt="image" src="https://github.com/apache/spark/assets/15246973/34e7e7e8-4e72-470e-abc0-d79406ad25e5">

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
- Manually test
- Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#46246 from panbingkun/test_spark_build.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…ector` to maven compile args

### What changes were proposed in this pull request?
The pr is following up apache#46246
The pr aims to  add `--add-modules=jdk.incubator.vector` to maven `compile args`.

### Why are the changes needed?
As commented by LuciferYang , we need to be consistent in `maven` compile.
apache#46246 (comment)
<img width="907" alt="image" src="https://github.com/apache/spark/assets/15246973/26163da2-f27d-4ec2-893f-d9282b68aec1">

### Does this PR introduce _any_ user-facing change?
No.

### How was this patch tested?
Pass GA.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#46259 from panbingkun/SPARK-48021.

Authored-by: panbingkun <panbingkun@baidu.com>
Signed-off-by: yangjie01 <yangjie01@baidu.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants