[SPARK-1594][MLLIB] Cleaning up MLlib APIs and guide #524

mengxr · 2014-04-24T06:51:25Z

Final pass before the v1.0 release.

Remove VectorRDDs
Move BinaryClassificationMetrics from evaluation.binary to evaluation
Change default value of addIntercept to false and allow to add intercept in Ridge and Lasso.
Clean DecisionTree package doc and test suite.
Mark model constructors private[spark]
Rename loadLibSVMData to loadLibSVMFile and hide LabelParser from users.
Add saveAsLibSVMFile.
Add appendBias to MLUtils.

…ecessary for us to maintain

AmplabJenkins · 2014-04-24T06:52:55Z

Merged build triggered.

AmplabJenkins · 2014-04-24T06:53:01Z

Merged build started.

AmplabJenkins · 2014-04-24T06:54:45Z

Merged build finished.

AmplabJenkins · 2014-04-24T06:54:45Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14429/

mateiz · 2014-04-24T07:24:24Z

Hey Xiangrui, can you also add Scaladocs on the main methods? Otherwise it looks weird to have an experimental method. Really these were meant to give people an easy way to run these algorithms from the command line, but very few are probably using it because we didn't document it, so we could also move these methods into examples instead (e.g. in an org.apache.spark.examples.mllib package in the examples project). They are also useful for perf testing during development.

mengxr · 2014-04-25T20:59:24Z

@mateiz My concern is that whether we want to encourage users to use those main methods if we know they are definitely going to change in the next release. Now the main methods take arguments in a predefined order. It is hard to remember and there is no help message.

I'm testing scopt for parsing options: https://github.com/mengxr/MLlib-QA/blob/master/src/main/scala/mllib/qa/BinaryClassification.scala

The question is whether we want to make the change now or in the next release.

…inary

AmplabJenkins · 2014-04-29T09:22:57Z

Merged build triggered.

AmplabJenkins · 2014-04-29T09:23:03Z

Merged build started.

AmplabJenkins · 2014-04-29T09:24:19Z

Merged build finished.

AmplabJenkins · 2014-04-29T09:24:19Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14563/

AmplabJenkins · 2014-04-29T09:27:57Z

Merged build triggered.

AmplabJenkins · 2014-04-29T09:28:03Z

Merged build started.

AmplabJenkins · 2014-04-29T09:59:48Z

Merged build finished.

AmplabJenkins · 2014-04-29T09:59:48Z

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14564/

AmplabJenkins · 2014-04-29T21:17:57Z

Merged build triggered.

AmplabJenkins · 2014-04-29T21:18:07Z

Merged build started.

AmplabJenkins · 2014-04-29T21:55:12Z

Merged build finished.

AmplabJenkins · 2014-05-01T00:12:57Z

Merged build triggered.

AmplabJenkins · 2014-05-01T00:13:05Z

Merged build started.

AmplabJenkins · 2014-05-01T00:51:52Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-01T00:51:52Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14594/

AmplabJenkins · 2014-05-05T18:23:01Z

Merged build triggered.

AmplabJenkins · 2014-05-05T18:23:08Z

Merged build started.

AmplabJenkins · 2014-05-05T19:08:51Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-05T19:08:52Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14671/

AmplabJenkins · 2014-05-05T22:57:57Z

Merged build triggered.

AmplabJenkins · 2014-05-05T22:58:03Z

Merged build started.

AmplabJenkins · 2014-05-05T23:32:26Z

Merged build finished. All automated tests passed.

AmplabJenkins · 2014-05-05T23:32:27Z

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14684/

mengxr · 2014-05-06T00:16:27Z

@mateiz Could you make another scan and see whether this is okay to merge? Thanks!

mateiz · 2014-05-06T01:33:32Z

Thanks, it looks good. I've merged this in.

Final pass before the v1.0 release. * Remove `VectorRDDs` * Move `BinaryClassificationMetrics` from `evaluation.binary` to `evaluation` * Change default value of `addIntercept` to false and allow to add intercept in Ridge and Lasso. * Clean `DecisionTree` package doc and test suite. * Mark model constructors `private[spark]` * Rename `loadLibSVMData` to `loadLibSVMFile` and hide `LabelParser` from users. * Add `saveAsLibSVMFile`. * Add `appendBias` to `MLUtils`. Author: Xiangrui Meng <meng@databricks.com> Closes #524 from mengxr/mllib-cleaning and squashes the following commits: 295dc8b [Xiangrui Meng] update loadLibSVMFile doc 1977ac1 [Xiangrui Meng] fix doc of appendBias 649fcf0 [Xiangrui Meng] rename loadLibSVMData to loadLibSVMFile; hide LabelParser from user APIs 54b812c [Xiangrui Meng] add appendBias a71e7d0 [Xiangrui Meng] add saveAsLibSVMFile d976295 [Xiangrui Meng] Merge branch 'master' into mllib-cleaning b7e5cec [Xiangrui Meng] remove some experimental annotations and make model constructors private[mllib] 9b02b93 [Xiangrui Meng] minor code style update a593ddc [Xiangrui Meng] fix python tests fc28c18 [Xiangrui Meng] mark more classes experimental f6cbbff [Xiangrui Meng] fix Java tests 0af70b0 [Xiangrui Meng] minor 6e139ef [Xiangrui Meng] Merge branch 'master' into mllib-cleaning 94e6dce [Xiangrui Meng] move BinaryLabelCounter and BinaryConfusionMatrixImpl to evaluation.binary df34907 [Xiangrui Meng] clean DecisionTreeSuite to use LocalSparkContext c81807f [Xiangrui Meng] set the default value of AddIntercept to false 03389c0 [Xiangrui Meng] allow to add intercept in Ridge and Lasso c66c56f [Xiangrui Meng] move tree md to package object doc a2695df [Xiangrui Meng] update guide for BinaryClassificationMetrics 9194f4c [Xiangrui Meng] move BinaryClassificationMetrics one level up 1c1a0e3 [Xiangrui Meng] remove VectorRDDs because it only contains one function that is not necessary for us to maintain (cherry picked from commit 98750a7) Signed-off-by: Matei Zaharia <matei@databricks.com>

Added spark.shuffle.file.buffer.kb to configuration doc. Author: Reynold Xin <rxin@apache.org> == Merge branch commits == commit 0eea1d761ff772ff89be234e1e28035d54e5a7de Author: Reynold Xin <rxin@apache.org> Date: Wed Jan 29 14:40:48 2014 -0800 Added spark.shuffle.file.buffer.kb to configuration doc.

Final pass before the v1.0 release. * Remove `VectorRDDs` * Move `BinaryClassificationMetrics` from `evaluation.binary` to `evaluation` * Change default value of `addIntercept` to false and allow to add intercept in Ridge and Lasso. * Clean `DecisionTree` package doc and test suite. * Mark model constructors `private[spark]` * Rename `loadLibSVMData` to `loadLibSVMFile` and hide `LabelParser` from users. * Add `saveAsLibSVMFile`. * Add `appendBias` to `MLUtils`. Author: Xiangrui Meng <meng@databricks.com> Closes apache#524 from mengxr/mllib-cleaning and squashes the following commits: 295dc8b [Xiangrui Meng] update loadLibSVMFile doc 1977ac1 [Xiangrui Meng] fix doc of appendBias 649fcf0 [Xiangrui Meng] rename loadLibSVMData to loadLibSVMFile; hide LabelParser from user APIs 54b812c [Xiangrui Meng] add appendBias a71e7d0 [Xiangrui Meng] add saveAsLibSVMFile d976295 [Xiangrui Meng] Merge branch 'master' into mllib-cleaning b7e5cec [Xiangrui Meng] remove some experimental annotations and make model constructors private[mllib] 9b02b93 [Xiangrui Meng] minor code style update a593ddc [Xiangrui Meng] fix python tests fc28c18 [Xiangrui Meng] mark more classes experimental f6cbbff [Xiangrui Meng] fix Java tests 0af70b0 [Xiangrui Meng] minor 6e139ef [Xiangrui Meng] Merge branch 'master' into mllib-cleaning 94e6dce [Xiangrui Meng] move BinaryLabelCounter and BinaryConfusionMatrixImpl to evaluation.binary df34907 [Xiangrui Meng] clean DecisionTreeSuite to use LocalSparkContext c81807f [Xiangrui Meng] set the default value of AddIntercept to false 03389c0 [Xiangrui Meng] allow to add intercept in Ridge and Lasso c66c56f [Xiangrui Meng] move tree md to package object doc a2695df [Xiangrui Meng] update guide for BinaryClassificationMetrics 9194f4c [Xiangrui Meng] move BinaryClassificationMetrics one level up 1c1a0e3 [Xiangrui Meng] remove VectorRDDs because it only contains one function that is not necessary for us to maintain

Added spark.shuffle.file.buffer.kb to configuration doc. Author: Reynold Xin <rxin@apache.org> == Merge branch commits == commit 0eea1d761ff772ff89be234e1e28035d54e5a7de Author: Reynold Xin <rxin@apache.org> Date: Wed Jan 29 14:40:48 2014 -0800 Added spark.shuffle.file.buffer.kb to configuration doc. (cherry picked from commit ac712e4) Signed-off-by: Patrick Wendell <pwendell@gmail.com>

@ifilonenko

…pache#524) Implements the shuffle writer API by writing shuffle files to local disk and using the index block resolver to commit data and write index files. The logic in `BypassMergeSortShuffleWriter` has been refactored to use the base implementation of the plugin instead. APIs have been slightly renamed to clarify semantics after considering nuances in how these are to be implemented by other developers. Follow-up commits are to come for `SortShuffleWriter` and `UnsafeShuffleWriter`. Ported from bloomberg#6, credits to @ifilonenko.

@ifilonenko

…pache#524) Implements the shuffle writer API by writing shuffle files to local disk and using the index block resolver to commit data and write index files. The logic in `BypassMergeSortShuffleWriter` has been refactored to use the base implementation of the plugin instead. APIs have been slightly renamed to clarify semantics after considering nuances in how these are to be implemented by other developers. Follow-up commits are to come for `SortShuffleWriter` and `UnsafeShuffleWriter`. Ported from bloomberg#6, credits to @ifilonenko.

Run bosh cpi test against openstack stable/stein in periodic pipeline. Closes: theopenlab/openlab#265

mengxr added 4 commits April 23, 2014 14:52

remove VectorRDDs because it only contains one function that is not n…

1c1a0e3

…ecessary for us to maintain

move BinaryClassificationMetrics one level up

9194f4c

update guide for BinaryClassificationMetrics

a2695df

move tree md to package object doc

c66c56f

mengxr changed the title ~~[WIP][SPARK-1594] Cleaning up MLlib APIs and guide~~ [WIP][SPARK-1594][MLLIB] Cleaning up MLlib APIs and guide Apr 24, 2014

mengxr added 5 commits April 29, 2014 02:21

allow to add intercept in Ridge and Lasso

03389c0

set the default value of AddIntercept to false

c81807f

clean DecisionTreeSuite to use LocalSparkContext

df34907

move BinaryLabelCounter and BinaryConfusionMatrixImpl to evaluation.b…

94e6dce

…inary

Merge branch 'master' into mllib-cleaning

6e139ef

minor

0af70b0

fix Java tests

f6cbbff

mengxr changed the title ~~[WIP][SPARK-1594][MLLIB] Cleaning up MLlib APIs and guide~~ [SPARK-1594][MLLIB] Cleaning up MLlib APIs and guide Apr 29, 2014

mengxr added 4 commits May 3, 2014 00:36

add saveAsLibSVMFile

a71e7d0

add appendBias

54b812c

rename loadLibSVMData to loadLibSVMFile; hide LabelParser from user APIs

649fcf0

fix doc of appendBias

1977ac1

update loadLibSVMFile doc

295dc8b

asfgit closed this in 98750a7 May 6, 2014

mengxr deleted the mllib-cleaning branch May 7, 2014 00:06

bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019

Enable bosh cpi test against openstack stein periodic job (apache#524)

afb9486

Run bosh cpi test against openstack stable/stein in periodic pipeline. Closes: theopenlab/openlab#265

arjunshroff pushed a commit to arjunshroff/spark that referenced this pull request Nov 24, 2020

MapR [SPARK-567] Update Hive version for Spark-2.4.3 (apache#524)

784332e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-1594][MLLIB] Cleaning up MLlib APIs and guide #524

[SPARK-1594][MLLIB] Cleaning up MLlib APIs and guide #524

mengxr commented Apr 24, 2014

AmplabJenkins commented Apr 24, 2014

AmplabJenkins commented Apr 24, 2014

AmplabJenkins commented Apr 24, 2014

AmplabJenkins commented Apr 24, 2014

mateiz commented Apr 24, 2014

mengxr commented Apr 25, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

mengxr commented May 6, 2014

mateiz commented May 6, 2014

[SPARK-1594][MLLIB] Cleaning up MLlib APIs and guide #524

[SPARK-1594][MLLIB] Cleaning up MLlib APIs and guide #524

Conversation

mengxr commented Apr 24, 2014

AmplabJenkins commented Apr 24, 2014

AmplabJenkins commented Apr 24, 2014

AmplabJenkins commented Apr 24, 2014

AmplabJenkins commented Apr 24, 2014

mateiz commented Apr 24, 2014

mengxr commented Apr 25, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented Apr 29, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 1, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

AmplabJenkins commented May 5, 2014

mengxr commented May 6, 2014

mateiz commented May 6, 2014