-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48988][ML] Make DefaultParamsReader/Writer
handle metadata with spark session
#47467
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
zhengruifeng
commented
Jul 24, 2024
zhengruifeng
force-pushed
the
ml_load_with_spark
branch
from
July 24, 2024 04:40
f9d6ee8
to
d829fce
Compare
zhengruifeng
changed the title
[WIP][ML]
[WIP][ML] Make Jul 24, 2024
DefaultParamsReader/Writer
handle metadata with spark sessionDefaultParamsReader/Writer
handle metadata with spark session
HyukjinKwon
approved these changes
Jul 24, 2024
zhengruifeng
changed the title
[WIP][ML] Make
[SPARK-48988][ML] Make Jul 24, 2024
DefaultParamsReader/Writer
handle metadata with spark sessionDefaultParamsReader/Writer
handle metadata with spark session
WeichenXu123
approved these changes
Jul 24, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Merged to master. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM.
Thank you so much, @zhengruifeng , @HyukjinKwon , @WeichenXu123 .
ilicmarkodb
pushed a commit
to ilicmarkodb/spark
that referenced
this pull request
Jul 29, 2024
…ith spark session ### What changes were proposed in this pull request? `DefaultParamsReader/Writer` handle metadata with spark session ### Why are the changes needed? In existing ml implementations, when loading/saving a model, it loads/saves the metadata with `SparkContext` then loads/saves the coefficients with `SparkSession`. This PR aims to also load/save the metadata with `SparkSession`, by introducing new helper functions. - Note I: 3-rd libraries (e.g. [xgboost](https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-spark/src/main/scala/org/apache/spark/ml/util/XGBoostReadWrite.scala#L38-L53) ) likely depends on existing implementation of saveMetadata/loadMetadata, so we cannot simply remove them even though they are `private[ml]`. - Note II: this PR only handles `loadMetadata` and `saveMetadata`, there are similar cases for meta algorithms and param read/write, but I want to ignore the remaining part first, to avoid touching too many files in single PR. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47467 from zhengruifeng/ml_load_with_spark. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
HyukjinKwon
pushed a commit
that referenced
this pull request
Jul 30, 2024
…t spark session ### What changes were proposed in this pull request? Make model save/load helper functions accept spark session ### Why are the changes needed? 1, avoid unnecessary spark session creations; 2, to be consistent with scala side changes: #47467 and #47477 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? No Closes #47527 from zhengruifeng/py_ml_save_metadata_spark. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
fusheng-rd
pushed a commit
to fusheng-rd/spark
that referenced
this pull request
Aug 6, 2024
…ith spark session ### What changes were proposed in this pull request? `DefaultParamsReader/Writer` handle metadata with spark session ### Why are the changes needed? In existing ml implementations, when loading/saving a model, it loads/saves the metadata with `SparkContext` then loads/saves the coefficients with `SparkSession`. This PR aims to also load/save the metadata with `SparkSession`, by introducing new helper functions. - Note I: 3-rd libraries (e.g. [xgboost](https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-spark/src/main/scala/org/apache/spark/ml/util/XGBoostReadWrite.scala#L38-L53) ) likely depends on existing implementation of saveMetadata/loadMetadata, so we cannot simply remove them even though they are `private[ml]`. - Note II: this PR only handles `loadMetadata` and `saveMetadata`, there are similar cases for meta algorithms and param read/write, but I want to ignore the remaining part first, to avoid touching too many files in single PR. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47467 from zhengruifeng/ml_load_with_spark. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
fusheng-rd
pushed a commit
to fusheng-rd/spark
that referenced
this pull request
Aug 6, 2024
…t spark session ### What changes were proposed in this pull request? Make model save/load helper functions accept spark session ### Why are the changes needed? 1, avoid unnecessary spark session creations; 2, to be consistent with scala side changes: apache#47467 and apache#47477 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47527 from zhengruifeng/py_ml_save_metadata_spark. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
attilapiros
pushed a commit
to attilapiros/spark
that referenced
this pull request
Oct 4, 2024
…ith spark session ### What changes were proposed in this pull request? `DefaultParamsReader/Writer` handle metadata with spark session ### Why are the changes needed? In existing ml implementations, when loading/saving a model, it loads/saves the metadata with `SparkContext` then loads/saves the coefficients with `SparkSession`. This PR aims to also load/save the metadata with `SparkSession`, by introducing new helper functions. - Note I: 3-rd libraries (e.g. [xgboost](https://github.com/dmlc/xgboost/blob/master/jvm-packages/xgboost4j-spark/src/main/scala/org/apache/spark/ml/util/XGBoostReadWrite.scala#L38-L53) ) likely depends on existing implementation of saveMetadata/loadMetadata, so we cannot simply remove them even though they are `private[ml]`. - Note II: this PR only handles `loadMetadata` and `saveMetadata`, there are similar cases for meta algorithms and param read/write, but I want to ignore the remaining part first, to avoid touching too many files in single PR. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47467 from zhengruifeng/ml_load_with_spark. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
attilapiros
pushed a commit
to attilapiros/spark
that referenced
this pull request
Oct 4, 2024
…t spark session ### What changes were proposed in this pull request? Make model save/load helper functions accept spark session ### Why are the changes needed? 1, avoid unnecessary spark session creations; 2, to be consistent with scala side changes: apache#47467 and apache#47477 ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47527 from zhengruifeng/py_ml_save_metadata_spark. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
DefaultParamsReader/Writer
handle metadata with spark sessionWhy are the changes needed?
In existing ml implementations, when loading/saving a model, it loads/saves the metadata with
SparkContext
then loads/saves the coefficients withSparkSession
.This PR aims to also load/save the metadata with
SparkSession
, by introducing new helper functions.Note I: 3-rd libraries (e.g. xgboost ) likely depends on existing implementation of saveMetadata/loadMetadata, so we cannot simply remove them even though they are
private[ml]
.Note II: this PR only handles
loadMetadata
andsaveMetadata
, there are similar cases for meta algorithms and param read/write, but I want to ignore the remaining part first, to avoid touching too many files in single PR.Does this PR introduce any user-facing change?
No
How was this patch tested?
CI
Was this patch authored or co-authored using generative AI tooling?
No