Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-1406] Mllib pmml model export #3062

Conversation

selvinsource
Copy link
Contributor

See PDF attached to the JIRA issue 1406.

The contribution is my original work and I license the work to the project under the project's open source license.

@selvinsource
Copy link
Contributor Author

@mengxr
Copy link
Contributor

mengxr commented Nov 2, 2014

ok to test

@mengxr
Copy link
Contributor

mengxr commented Nov 2, 2014

test this please

@mengxr
Copy link
Contributor

mengxr commented Nov 2, 2014

@srowen Could you help review this? Thanks!

@SparkQA
Copy link

SparkQA commented Nov 2, 2014

Test build #22763 has finished for PR 3062 at commit aba5ee1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • trait ModelExport
    • class KMeansPMMLModelExport(model : KMeansModel) extends PMMLModelExport
    • trait PMMLModelExport extends ModelExport

@srowen
Copy link
Member

srowen commented Nov 3, 2014

So, one broad comment: I wonder if it is over-engineering to try to support other formats besides PMML. In practice I can't imagine any other export format that would be commonly used enough to justify continuing to add both it and a PMML export for each new model type. I was just expecting a .pmml subpackage to handle PMML.

I can see some value in creating a very thin wrapper around a PMML to at least not include this type in the API, and instead provide a toPMML(...) factory method that translates accordingly.

import java.io.FileOutputStream
import java.io.File

trait ModelExport {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this conflates the idea of being a model with serializing a model to a particular form or place. For example this limits to files, whereas JPMML has one-liners for serializing to a bunch of things already. I think this could be punted to caller code, or at least an orthogonal helper class, not part of the export type hierarchy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you are referring to
JAXBUtil.marshalPMML(pmml, new StreamResult(outputStream));
which takes as param the StreamResult object and for which there are many constructors.

I didn't want to expose the StreamResult as it is xml specific, but if the decision is to only support PMML xml we may add also the StreamResult.

Personally I would still like a simple saveLocalFile(path: String) so that the user doesn't have to think too much and just call:
ModelExportFactory.createModelExport(clusters,ModelExportType.PMML).saveLocalF
ile("/tmp/kmeans.xml") for all models.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, a utility method makes some sense, just not as part of the type hierarchy perhaps. You can expose a utility object that reads/writes PMML objects separately.

@srowen
Copy link
Member

srowen commented Nov 3, 2014

This won't tell you much of anything you don't already know but here's a bit of code also making a K-means model with JPMML: https://github.com/OryxProject/oryx/blob/master/oryx-ml-mllib/src/main/java/com/cloudera/oryx/ml/mllib/kmeans/KMeansUpdate.java#L121

@selvinsource
Copy link
Contributor Author

Hi Sean,
first of all thanks for your time in reviewing this.

If you don't foresee any other export format I agree that the ModelExport is not really needed and can be removed together with the ModelExportType. At that point I would move the ModelExportFactory to the pmml package and call it PMMLModelExportFactory.

Let me know how to proceed.

@srowen
Copy link
Member

srowen commented Nov 4, 2014

@selvinsource Maybe we can ask if anyone else has an idea? I can think of closed-source formats but I doubt they will be worth supporting. I think there's value in perhaps a thin wrapper on top of all this in order to keep lib-specific classes like PMML out of the Spark API.

@mengxr
Copy link
Contributor

mengxr commented Apr 27, 2015

@selvinsource Thanks for testing! If the threshold is set to none, I think we should use 0.5 for logistic regression and 0.0 for SVM, maybe with warning log messages. So they are still encoded as classification models. Your proposal on how to set the intercept for NO in logistic regression looks good to me. Please remember to handle the extreme case (when threshold = 1.0 or 0.0).

@selvinsource
Copy link
Contributor Author

@mengxr please review, it should work as expected now.

@mengxr
Copy link
Contributor

mengxr commented Apr 28, 2015

test this please

@SparkQA
Copy link

SparkQA commented Apr 28, 2015

Test build #31147 has finished for PR 3062 at commit 30165c4.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeansModel (
    • trait PMMLExportable
  • This patch adds the following new dependencies:
    • jaxb-api-2.2.7.jar
    • jaxb-core-2.2.7.jar
    • jaxb-impl-2.2.7.jar
    • pmml-agent-1.1.15.jar
    • pmml-model-1.1.15.jar
    • pmml-schema-1.1.15.jar
  • This patch removes the following dependencies:
    • activation-1.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar

@mengxr
Copy link
Contributor

mengxr commented Apr 28, 2015

@selvinsource You can run dev/lint-scala on your local to check Scala style.

var interceptNO = threshold
if (RegressionNormalizationMethodType.LOGIT == normalizationMethod) {
if (threshold <= 0)
interceptNO = -1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be Double.MinValue.

@selvinsource
Copy link
Contributor Author

@mengxr updated based on your latest comments.

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2015

test this please

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31313 has finished for PR 3062 at commit 852aac6.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeansModel (
    • trait PMMLExportable
  • This patch adds the following new dependencies:
    • jaxb-api-2.2.7.jar
    • jaxb-core-2.2.7.jar
    • jaxb-impl-2.2.7.jar
    • pmml-agent-1.1.15.jar
    • pmml-model-1.1.15.jar
    • pmml-schema-1.1.15.jar
    • spark-unsafe_2.10-1.4.0-SNAPSHOT.jar
  • This patch removes the following dependencies:
    • activation-1.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar

@mengxr
Copy link
Contributor

mengxr commented Apr 29, 2015

test this please

@SparkQA
Copy link

SparkQA commented Apr 29, 2015

Test build #31339 has finished for PR 3062 at commit 852aac6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeansModel (
    • trait PMMLExportable
  • This patch adds the following new dependencies:
    • jaxb-api-2.2.7.jar
    • jaxb-core-2.2.7.jar
    • jaxb-impl-2.2.7.jar
    • pmml-agent-1.1.15.jar
    • pmml-model-1.1.15.jar
    • pmml-schema-1.1.15.jar
    • spark-unsafe_2.10-1.4.0-SNAPSHOT.jar
  • This patch removes the following dependencies:
    • activation-1.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar

@mengxr
Copy link
Contributor

mengxr commented Apr 30, 2015

test this please

@SparkQA
Copy link

SparkQA commented Apr 30, 2015

Test build #31357 has finished for PR 3062 at commit 852aac6.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class KMeansModel (
    • trait PMMLExportable
  • This patch adds the following new dependencies:
    • jaxb-api-2.2.7.jar
    • jaxb-core-2.2.7.jar
    • jaxb-impl-2.2.7.jar
    • pmml-agent-1.1.15.jar
    • pmml-model-1.1.15.jar
    • pmml-schema-1.1.15.jar
  • This patch removes the following dependencies:
    • activation-1.1.jar
    • jaxb-api-2.2.2.jar
    • jaxb-impl-2.2.3-1.jar

@asfgit asfgit closed this in 254e050 Apr 30, 2015
@mengxr
Copy link
Contributor

mengxr commented Apr 30, 2015

LGTM. Merged into master. Thanks! @selvinsource I created SPARK-7272 for the user guide and assigned it to you. Let me know if you don't have time to do it. Btw, please submit your PMML evaluator code as a Spark package at spark-packages.org. Due to license issues, we cannot include them as example code in the Spark codebase. But PMML model scoring is quite important.

@selvinsource
Copy link
Contributor Author

@mengxr thanks for merging this PR!

I will look into submitting the evaluator at spark-packages if you think it will be useful, I used it as side project to make sure the exported models were working correctly.

On the documentation, please point me in the right direction and I will be happy to help. We can discuss this on SPARK-7272.

@selvinsource
Copy link
Contributor Author

@mengxr jpmml evaluator submitted to http://spark-packages.org

jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
See PDF attached to the JIRA issue 1406.

The contribution is my original work and I license the work to the project under the project's open source license.

Author: Vincenzo Selvaggio <vselvaggio@hotmail.it>
Author: Xiangrui Meng <meng@databricks.com>
Author: selvinsource <vselvaggio@hotmail.it>

Closes apache#3062 from selvinsource/mllib_pmml_model_export_SPARK-1406 and squashes the following commits:

852aac6 [Vincenzo Selvaggio] [SPARK-1406] Update JPMML version to 1.1.15 in LICENSE file
085cf42 [Vincenzo Selvaggio] [SPARK-1406] Added Double Min and Max Fixed scala style
30165c4 [Vincenzo Selvaggio] [SPARK-1406] Fixed extreme cases for logit
7a5e0ec [Vincenzo Selvaggio] [SPARK-1406] Binary classification for SVM and Logistic Regression
cfcb596 [Vincenzo Selvaggio] [SPARK-1406] Throw IllegalArgumentException when exporting a multinomial logistic regression
25dce33 [Vincenzo Selvaggio] [SPARK-1406] Update code to latest pmml model
dea98ca [Vincenzo Selvaggio] [SPARK-1406] Exclude transitive dependency for pmml model
66b7c12 [Vincenzo Selvaggio] [SPARK-1406] Updated pmml model lib to 1.1.15, latest Java 6 compatible
a0a55f7 [Vincenzo Selvaggio] Merge pull request apache#2 from mengxr/SPARK-1406
3c22f79 [Xiangrui Meng] more code style
e2313df [Vincenzo Selvaggio] Merge pull request apache#1 from mengxr/SPARK-1406
472d757 [Xiangrui Meng] fix code style
1676e15 [Vincenzo Selvaggio] fixed scala issue
e2ffae8 [Vincenzo Selvaggio] fixed scala style
b8823b0 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
b25bbf7 [Vincenzo Selvaggio] [SPARK-1406] Added export of pmml to distributed file system using the spark context
7a949d0 [Vincenzo Selvaggio] [SPARK-1406] Fixed scala style
f46c75c [Vincenzo Selvaggio] [SPARK-1406] Added PMMLExportable to supported models
7b33b4e [Vincenzo Selvaggio] [SPARK-1406] Added a PMMLExportable interface Restructured code in a new package mllib.pmml Supported models implements the new PMMLExportable interface: LogisticRegression, SVM, KMeansModel, LinearRegression, RidgeRegression, Lasso
d559ec5 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
8fe12bb [Vincenzo Selvaggio] [SPARK-1406] Adjusted logistic regression export description and target categories
03bc3a5 [Vincenzo Selvaggio] added logistic regression
da2ec11 [Vincenzo Selvaggio] [SPARK-1406] added linear SVM PMML export
82f2131 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
19adf29 [Vincenzo Selvaggio] [SPARK-1406] Fixed scala style
1faf985 [Vincenzo Selvaggio] [SPARK-1406] Added target field to the regression model for completeness Adjusted unit test to deal with this change
3ae8ae5 [Vincenzo Selvaggio] [SPARK-1406] Adjusted imported order according to the guidelines
c67ce81 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
78515ec [Vincenzo Selvaggio] [SPARK-1406] added pmml export for LinearRegressionModel, RidgeRegressionModel and LassoModel
e29dfb9 [Vincenzo Selvaggio] removed version, by default is set to 4.2 (latest from jpmml) removed copyright
ae8b993 [Vincenzo Selvaggio] updated some commented tests to use the new ModelExporter object reordered the imports
df8a89e [Vincenzo Selvaggio] added pmml version to pmml model changed the copyright to spark
a1b4dc3 [Vincenzo Selvaggio] updated imports
834ca44 [Vincenzo Selvaggio] reordered the import accordingly to the guidelines
349a76b [Vincenzo Selvaggio] new helper object to serialize the models to pmml format
c3ef9b8 [Vincenzo Selvaggio] set it to private
6357b98 [Vincenzo Selvaggio] set it to private
e1eb251 [Vincenzo Selvaggio] removed serialization part, this will be part of the ModelExporter helper object
aba5ee1 [Vincenzo Selvaggio] fixed cluster export
cd6c07c [Vincenzo Selvaggio] fixed scala style to run tests
f75b988 [Vincenzo Selvaggio] Merge remote-tracking branch 'origin/master' into mllib_pmml_model_export_SPARK-1406
07a29bf [selvinsource] Update LICENSE
8841439 [Vincenzo Selvaggio] adjust scala style in order to compile
1433b11 [Vincenzo Selvaggio] complete suite tests
8e71b8d [Vincenzo Selvaggio] kmeans pmml export implementation
9bc494f [Vincenzo Selvaggio] added scala suite tests added saveLocalFile to ModelExport trait
226e184 [Vincenzo Selvaggio] added javadoc and export model type in case there is a need to support other types of export (not just PMML)
a0e3679 [Vincenzo Selvaggio] export and pmml export traits kmeans test implementation
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
See PDF attached to the JIRA issue 1406.

The contribution is my original work and I license the work to the project under the project's open source license.

Author: Vincenzo Selvaggio <vselvaggio@hotmail.it>
Author: Xiangrui Meng <meng@databricks.com>
Author: selvinsource <vselvaggio@hotmail.it>

Closes apache#3062 from selvinsource/mllib_pmml_model_export_SPARK-1406 and squashes the following commits:

852aac6 [Vincenzo Selvaggio] [SPARK-1406] Update JPMML version to 1.1.15 in LICENSE file
085cf42 [Vincenzo Selvaggio] [SPARK-1406] Added Double Min and Max Fixed scala style
30165c4 [Vincenzo Selvaggio] [SPARK-1406] Fixed extreme cases for logit
7a5e0ec [Vincenzo Selvaggio] [SPARK-1406] Binary classification for SVM and Logistic Regression
cfcb596 [Vincenzo Selvaggio] [SPARK-1406] Throw IllegalArgumentException when exporting a multinomial logistic regression
25dce33 [Vincenzo Selvaggio] [SPARK-1406] Update code to latest pmml model
dea98ca [Vincenzo Selvaggio] [SPARK-1406] Exclude transitive dependency for pmml model
66b7c12 [Vincenzo Selvaggio] [SPARK-1406] Updated pmml model lib to 1.1.15, latest Java 6 compatible
a0a55f7 [Vincenzo Selvaggio] Merge pull request apache#2 from mengxr/SPARK-1406
3c22f79 [Xiangrui Meng] more code style
e2313df [Vincenzo Selvaggio] Merge pull request apache#1 from mengxr/SPARK-1406
472d757 [Xiangrui Meng] fix code style
1676e15 [Vincenzo Selvaggio] fixed scala issue
e2ffae8 [Vincenzo Selvaggio] fixed scala style
b8823b0 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
b25bbf7 [Vincenzo Selvaggio] [SPARK-1406] Added export of pmml to distributed file system using the spark context
7a949d0 [Vincenzo Selvaggio] [SPARK-1406] Fixed scala style
f46c75c [Vincenzo Selvaggio] [SPARK-1406] Added PMMLExportable to supported models
7b33b4e [Vincenzo Selvaggio] [SPARK-1406] Added a PMMLExportable interface Restructured code in a new package mllib.pmml Supported models implements the new PMMLExportable interface: LogisticRegression, SVM, KMeansModel, LinearRegression, RidgeRegression, Lasso
d559ec5 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
8fe12bb [Vincenzo Selvaggio] [SPARK-1406] Adjusted logistic regression export description and target categories
03bc3a5 [Vincenzo Selvaggio] added logistic regression
da2ec11 [Vincenzo Selvaggio] [SPARK-1406] added linear SVM PMML export
82f2131 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
19adf29 [Vincenzo Selvaggio] [SPARK-1406] Fixed scala style
1faf985 [Vincenzo Selvaggio] [SPARK-1406] Added target field to the regression model for completeness Adjusted unit test to deal with this change
3ae8ae5 [Vincenzo Selvaggio] [SPARK-1406] Adjusted imported order according to the guidelines
c67ce81 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
78515ec [Vincenzo Selvaggio] [SPARK-1406] added pmml export for LinearRegressionModel, RidgeRegressionModel and LassoModel
e29dfb9 [Vincenzo Selvaggio] removed version, by default is set to 4.2 (latest from jpmml) removed copyright
ae8b993 [Vincenzo Selvaggio] updated some commented tests to use the new ModelExporter object reordered the imports
df8a89e [Vincenzo Selvaggio] added pmml version to pmml model changed the copyright to spark
a1b4dc3 [Vincenzo Selvaggio] updated imports
834ca44 [Vincenzo Selvaggio] reordered the import accordingly to the guidelines
349a76b [Vincenzo Selvaggio] new helper object to serialize the models to pmml format
c3ef9b8 [Vincenzo Selvaggio] set it to private
6357b98 [Vincenzo Selvaggio] set it to private
e1eb251 [Vincenzo Selvaggio] removed serialization part, this will be part of the ModelExporter helper object
aba5ee1 [Vincenzo Selvaggio] fixed cluster export
cd6c07c [Vincenzo Selvaggio] fixed scala style to run tests
f75b988 [Vincenzo Selvaggio] Merge remote-tracking branch 'origin/master' into mllib_pmml_model_export_SPARK-1406
07a29bf [selvinsource] Update LICENSE
8841439 [Vincenzo Selvaggio] adjust scala style in order to compile
1433b11 [Vincenzo Selvaggio] complete suite tests
8e71b8d [Vincenzo Selvaggio] kmeans pmml export implementation
9bc494f [Vincenzo Selvaggio] added scala suite tests added saveLocalFile to ModelExport trait
226e184 [Vincenzo Selvaggio] added javadoc and export model type in case there is a need to support other types of export (not just PMML)
a0e3679 [Vincenzo Selvaggio] export and pmml export traits kmeans test implementation
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
See PDF attached to the JIRA issue 1406.

The contribution is my original work and I license the work to the project under the project's open source license.

Author: Vincenzo Selvaggio <vselvaggio@hotmail.it>
Author: Xiangrui Meng <meng@databricks.com>
Author: selvinsource <vselvaggio@hotmail.it>

Closes apache#3062 from selvinsource/mllib_pmml_model_export_SPARK-1406 and squashes the following commits:

852aac6 [Vincenzo Selvaggio] [SPARK-1406] Update JPMML version to 1.1.15 in LICENSE file
085cf42 [Vincenzo Selvaggio] [SPARK-1406] Added Double Min and Max Fixed scala style
30165c4 [Vincenzo Selvaggio] [SPARK-1406] Fixed extreme cases for logit
7a5e0ec [Vincenzo Selvaggio] [SPARK-1406] Binary classification for SVM and Logistic Regression
cfcb596 [Vincenzo Selvaggio] [SPARK-1406] Throw IllegalArgumentException when exporting a multinomial logistic regression
25dce33 [Vincenzo Selvaggio] [SPARK-1406] Update code to latest pmml model
dea98ca [Vincenzo Selvaggio] [SPARK-1406] Exclude transitive dependency for pmml model
66b7c12 [Vincenzo Selvaggio] [SPARK-1406] Updated pmml model lib to 1.1.15, latest Java 6 compatible
a0a55f7 [Vincenzo Selvaggio] Merge pull request apache#2 from mengxr/SPARK-1406
3c22f79 [Xiangrui Meng] more code style
e2313df [Vincenzo Selvaggio] Merge pull request apache#1 from mengxr/SPARK-1406
472d757 [Xiangrui Meng] fix code style
1676e15 [Vincenzo Selvaggio] fixed scala issue
e2ffae8 [Vincenzo Selvaggio] fixed scala style
b8823b0 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
b25bbf7 [Vincenzo Selvaggio] [SPARK-1406] Added export of pmml to distributed file system using the spark context
7a949d0 [Vincenzo Selvaggio] [SPARK-1406] Fixed scala style
f46c75c [Vincenzo Selvaggio] [SPARK-1406] Added PMMLExportable to supported models
7b33b4e [Vincenzo Selvaggio] [SPARK-1406] Added a PMMLExportable interface Restructured code in a new package mllib.pmml Supported models implements the new PMMLExportable interface: LogisticRegression, SVM, KMeansModel, LinearRegression, RidgeRegression, Lasso
d559ec5 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
8fe12bb [Vincenzo Selvaggio] [SPARK-1406] Adjusted logistic regression export description and target categories
03bc3a5 [Vincenzo Selvaggio] added logistic regression
da2ec11 [Vincenzo Selvaggio] [SPARK-1406] added linear SVM PMML export
82f2131 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
19adf29 [Vincenzo Selvaggio] [SPARK-1406] Fixed scala style
1faf985 [Vincenzo Selvaggio] [SPARK-1406] Added target field to the regression model for completeness Adjusted unit test to deal with this change
3ae8ae5 [Vincenzo Selvaggio] [SPARK-1406] Adjusted imported order according to the guidelines
c67ce81 [Vincenzo Selvaggio] Merge remote-tracking branch 'upstream/master' into mllib_pmml_model_export_SPARK-1406
78515ec [Vincenzo Selvaggio] [SPARK-1406] added pmml export for LinearRegressionModel, RidgeRegressionModel and LassoModel
e29dfb9 [Vincenzo Selvaggio] removed version, by default is set to 4.2 (latest from jpmml) removed copyright
ae8b993 [Vincenzo Selvaggio] updated some commented tests to use the new ModelExporter object reordered the imports
df8a89e [Vincenzo Selvaggio] added pmml version to pmml model changed the copyright to spark
a1b4dc3 [Vincenzo Selvaggio] updated imports
834ca44 [Vincenzo Selvaggio] reordered the import accordingly to the guidelines
349a76b [Vincenzo Selvaggio] new helper object to serialize the models to pmml format
c3ef9b8 [Vincenzo Selvaggio] set it to private
6357b98 [Vincenzo Selvaggio] set it to private
e1eb251 [Vincenzo Selvaggio] removed serialization part, this will be part of the ModelExporter helper object
aba5ee1 [Vincenzo Selvaggio] fixed cluster export
cd6c07c [Vincenzo Selvaggio] fixed scala style to run tests
f75b988 [Vincenzo Selvaggio] Merge remote-tracking branch 'origin/master' into mllib_pmml_model_export_SPARK-1406
07a29bf [selvinsource] Update LICENSE
8841439 [Vincenzo Selvaggio] adjust scala style in order to compile
1433b11 [Vincenzo Selvaggio] complete suite tests
8e71b8d [Vincenzo Selvaggio] kmeans pmml export implementation
9bc494f [Vincenzo Selvaggio] added scala suite tests added saveLocalFile to ModelExport trait
226e184 [Vincenzo Selvaggio] added javadoc and export model type in case there is a need to support other types of export (not just PMML)
a0e3679 [Vincenzo Selvaggio] export and pmml export traits kmeans test implementation
@tvmanikandan
Copy link

when PMML support for multi class expected to be released ? I am eagerly waiting PMML multiclass support for Logistic Regression , stochastic gradient descent (SGD) and SVM.
Please let me know.

-Mani

@selvinsource
Copy link
Contributor Author

@tvmanikandan
we are adding decision tree PMML support for multiclass classification, see https://issues.apache.org/jira/browse/SPARK-8542.

Linear SVMs supports only binary classification, while logistic regression supports both binary and multiclass classification problems. If you would like logistic regression multiclass classification PMML export and the decision tree is not enough, please create a Jira issue under the https://issues.apache.org/jira/browse/SPARK-8545 PMML improvement umbrella.

@manugarri
Copy link
Contributor

im not sure if this is the right place to ask, but is there any plan to implement PMML export from pyspark? Cant find anything on the pyspark docs.

@vruusmann
Copy link
Contributor

@manugarri You can export fitted pipeline models to PMML using the JPMML-SparkML-Package Apache Spark Package. There's a worked-out PySpark example in its README.md file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants