[SPARK-5995] [ML] Make Prediction dev API public #5913

jkbradley · 2015-05-05T17:30:57Z

Changes:

Update protected prediction methods, following design doc. <--most interesting change
Changed abstract classes for Estimator and Model to be public. Added DeveloperApi tag. (I kept the traits for Estimator/Model Params private.)
Changed ProbabilisticClassificationModel method names to use probability instead of probabilities.

CC: @mengxr @shivaram @etrain

…ion methods for efficient computation of multiple output columns.

SparkQA · 2015-05-05T19:15:47Z

Test build #31890 has finished for PR 5913 at commit 15b9957.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

shivaram · 2015-05-05T22:31:04Z

Thanks @jkbradley - Change mostly looks good to me. I also notice you've changed the LogisticRegression to not have the custom transform ? Is that not required anymore ?

jkbradley · 2015-05-05T22:53:20Z

@shivaram The updated ClassificationModel and ProbabilisticClassificationModel protected prediction methods should be more efficient now, so I don't think a specialized one for LogisticRegression is warranted.
Thanks for checking the PR!

mengxr · 2015-05-06T00:19:52Z

mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala

+   * Find the index of a maximal element.  Returns the first maximal element in case of a tie.
+   * Returns -1 if vector has length 0.
+   */
+  private[spark] def findMax: Int = {


findMax -> argmax? I think we can put this in a separate PR or move it to DenseVector only. The problem is with sparse vectors, where we need to return an index with value zero if nonzero elements are negative.

Oh, good point. I'll move it to DenseVector. We've often added helper methods in random places, rather than to linear algebra, utils, or the place they really should belong. I prefer we add these methods directly to the relevant place but make them private.

…tor of length 0

jkbradley · 2015-05-06T03:33:42Z

Fixed! Any other comments?

SparkQA · 2015-05-06T05:17:04Z

Test build #31945 has finished for PR 5913 at commit e9aa0ea.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-05-06T23:13:08Z

This looks good to me. There are two issues we may need to deal with in the future:

Where to put multi-label classifiers in the class hierarchy?
How to support distributively stored models? Those models could make predictions on individual records if the method call is triggered on the driver node. However, the predict method cannot be used in an RDD closure.

Changes: * Update protected prediction methods, following design doc. **<--most interesting change** * Changed abstract classes for Estimator and Model to be public. Added DeveloperApi tag. (I kept the traits for Estimator/Model Params private.) * Changed ProbabilisticClassificationModel method names to use probability instead of probabilities. CC: mengxr shivaram etrain Author: Joseph K. Bradley <joseph@databricks.com> Closes #5913 from jkbradley/public-dev-api and squashes the following commits: e9aa0ea [Joseph K. Bradley] moved findMax to DenseVector and renamed to argmax. fixed bug for vector of length 0 15b9957 [Joseph K. Bradley] renamed probabilities to probability in method names 5cda84d [Joseph K. Bradley] regenerated sharedParams 7d1877a [Joseph K. Bradley] Made spark.ml prediction abstractions public. Organized their prediction methods for efficient computation of multiple output columns. (cherry picked from commit 1ad04da) Signed-off-by: Xiangrui Meng <meng@databricks.com>

mengxr · 2015-05-06T23:16:22Z

Merged into master and branch-1.4. Thanks!

jkbradley · 2015-05-06T23:35:03Z

Made JIRAs for those 2 items:

multilabel abstractions: [https://issues.apache.org/jira/browse/SPARK-7409]
distributed models: [https://issues.apache.org/jira/browse/SPARK-7412]

Changes: * Update protected prediction methods, following design doc. **<--most interesting change** * Changed abstract classes for Estimator and Model to be public. Added DeveloperApi tag. (I kept the traits for Estimator/Model Params private.) * Changed ProbabilisticClassificationModel method names to use probability instead of probabilities. CC: mengxr shivaram etrain Author: Joseph K. Bradley <joseph@databricks.com> Closes apache#5913 from jkbradley/public-dev-api and squashes the following commits: e9aa0ea [Joseph K. Bradley] moved findMax to DenseVector and renamed to argmax. fixed bug for vector of length 0 15b9957 [Joseph K. Bradley] renamed probabilities to probability in method names 5cda84d [Joseph K. Bradley] regenerated sharedParams 7d1877a [Joseph K. Bradley] Made spark.ml prediction abstractions public. Organized their prediction methods for efficient computation of multiple output columns.

jkbradley added 3 commits May 5, 2015 00:56

Made spark.ml prediction abstractions public. Organized their predict…

7d1877a

…ion methods for efficient computation of multiple output columns.

regenerated sharedParams

5cda84d

renamed probabilities to probability in method names

15b9957

mengxr reviewed May 6, 2015
View reviewed changes

moved findMax to DenseVector and renamed to argmax. fixed bug for vec…

e9aa0ea

…tor of length 0

asfgit closed this in 1ad04da May 6, 2015

jkbradley deleted the public-dev-api branch July 25, 2016 20:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-5995] [ML] Make Prediction dev API public #5913

[SPARK-5995] [ML] Make Prediction dev API public #5913

jkbradley commented May 5, 2015

SparkQA commented May 5, 2015

shivaram commented May 5, 2015

jkbradley commented May 5, 2015

mengxr May 6, 2015

jkbradley May 6, 2015

jkbradley commented May 6, 2015

SparkQA commented May 6, 2015

mengxr commented May 6, 2015

mengxr commented May 6, 2015

jkbradley commented May 6, 2015

[SPARK-5995] [ML] Make Prediction dev API public #5913

[SPARK-5995] [ML] Make Prediction dev API public #5913

Conversation

jkbradley commented May 5, 2015

SparkQA commented May 5, 2015

shivaram commented May 5, 2015

jkbradley commented May 5, 2015

mengxr May 6, 2015

Choose a reason for hiding this comment

jkbradley May 6, 2015

Choose a reason for hiding this comment

jkbradley commented May 6, 2015

SparkQA commented May 6, 2015

mengxr commented May 6, 2015

mengxr commented May 6, 2015

jkbradley commented May 6, 2015