-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-13036][SPARK-13318][SPARK-13319] Add save/load for feature.py #11203
Conversation
ok to test |
@@ -53,6 +53,18 @@ class Binarizer(JavaTransformer, HasInputCol, HasOutputCol): | |||
>>> params = {binarizer.threshold: -0.5, binarizer.outputCol: "vector"} | |||
>>> binarizer.transform(df, params).head().vector | |||
1.0 | |||
>>> import tempfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know that. I'll update mine according to yours after it getting
merged.
2016年2月14日星期日,Holden Karau notifications@github.com 写道:
In python/pyspark/ml/feature.py
#11203 (comment):@@ -53,6 +53,18 @@ class Binarizer(JavaTransformer, HasInputCol, HasOutputCol):
>>> params = {binarizer.threshold: -0.5, binarizer.outputCol: "vector"}
>>> binarizer.transform(df, params).head().vector
1.0
>>> import tempfile
So as ended up being a follow up to #10999
#10999 we might want to simplify
this a bit so it is more like an example since doctests are also (ideally)
readable by users - maybe waiting for #11197
#11197 and then following the
pattern in that PR. cc @mengxr https://github.com/mengxr—
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/11203/files#r52852811.
Cheers
Xusen Yin (尹绪森)
LinkedIn: https://cn.linkedin.com/in/xusenyin
Test build #51285 has finished for PR 11203 at commit
|
retest it please |
Test build #51323 has finished for PR 11203 at commit
|
Test build #51327 has finished for PR 11203 at commit
|
test it please |
Test build #52002 has finished for PR 11203 at commit
|
@mengxr @yanboliang Ready for review. |
>>> loadedHashingTF = HashingTF.load(hashingTFPath) | ||
>>> param = loadedHashingTF.getParam("numFeatures") | ||
>>> loadedHashingTF.getOrDefault(param) == hashingTF.getOrDefault(param) | ||
True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use getNumFeatures
like other transformers in the doc test? It will make your test clean. HashingTF
extends from HasNumFeatures
, so it has this method.
@yinxusen Looks good overall, I left some inline comments. Thanks! |
@yanboliang Thanks for reviewing it! I'll change them soon. |
@yanboliang I leave doctests of |
Test build #52257 has finished for PR 11203 at commit
|
Test build #52259 has finished for PR 11203 at commit
|
Test build #52256 has finished for PR 11203 at commit
|
>>> modelPath = temp_path + "/max-abs-scaler-model" | ||
>>> model.save(modelPath) | ||
>>> loadedModel = MaxAbsScalerModel.load(modelPath) | ||
>>> loadedModel.transform(df).first().scaled == model.transform(df).first().scaled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should check the equality of maxAbs
which is a vector.
@yanboliang fixed and added the interface |
test it please |
Test build #52381 has finished for PR 11203 at commit
|
LGTM for me, cc @mengxr |
@yinxusen Could you resolve conflicts with master? |
@mengxr Solved. |
Test build #52423 has finished for PR 11203 at commit
|
Merged into master. Thanks! |
Add save/load for feature.py. Meanwhile, add save/load for `ElementwiseProduct` in Scala side and fix a bug of missing `setDefault` in `VectorSlicer` and `StopWordsRemover`. In this PR I ignore the `RFormula` and `RFormulaModel` because its Scala implementation is pending in apache#9884. I'll add them in this PR if apache#9884 gets merged first. Or add a follow-up JIRA for `RFormula`. Author: Xusen Yin <yinxusen@gmail.com> Closes apache#11203 from yinxusen/SPARK-13036.
Add save/load for feature.py. Meanwhile, add save/load for
ElementwiseProduct
in Scala side and fix a bug of missingsetDefault
inVectorSlicer
andStopWordsRemover
.In this PR I ignore the
RFormula
andRFormulaModel
because its Scala implementation is pending in #9884. I'll add them in this PR if #9884 gets merged first. Or add a follow-up JIRA forRFormula
.