-
Notifications
You must be signed in to change notification settings - Fork 366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CELEBORN-1413] Support Spark 4.0 #2813
Conversation
This PR is stale because it has been open 20 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
Will update this PR next week. |
6e99713
to
044f892
Compare
The CI for dependency check is not related to this PR. I'll fix that in another PR. |
Spark-4.0 profile can be compiled in JDK 17+ environment only. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
client-spark/spark-4-columnar-shuffle/src/test/resources/log4j.properties
Outdated
Show resolved
Hide resolved
client-spark/common/src/main/java/org/apache/spark/shuffle/celeborn/SparkCommonUtils.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @pan3793
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
QQ: Why are we moving spark-3 to spark-3-4 ?
It is possible that spark-4 impl ends up evolving nontrivially in time, and in a manner which conflicts with spark-3.
Do we want to simply keep spark-3 (which eventually goes away when support for spark-3.5 is dropped), and start with a new spark-4 ?
I understand the duplication aspect could be nontrivial - wondering if there is a middle ground for 3.5 vs 4, which minimizes this.
This would also minimize impact on existing spark-3 users (from dep pov)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a quick pass, will try to go over it in more details later during holidays.
But please dont block on my review.
worker/src/main/scala/org/apache/celeborn/service/deploy/worker/PushDataHandler.scala
Outdated
Show resolved
Hide resolved
...er/src/main/scala/org/apache/celeborn/service/deploy/master/http/api/v1/WorkerResource.scala
Show resolved
Hide resolved
The main reason to add a module to share within spark 3.5 and spark 4 is because the shuffle APIs are stable. The unstable APIs are in the columnar shuffle module which is separated from the module of spark 3.5. |
What changes were proposed in this pull request?
To support Spark 4.0.0 preview.
Why are the changes needed?
Does this PR introduce any user-facing change?
No.
How was this patch tested?
Cluster test.