[SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator #17085

imatiach-msft · 2017-02-27T18:24:14Z

What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: #16557
as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

The updates to the regression metrics were based on (and updated with new changes based on comments):
https://issues.apache.org/jira/browse/SPARK-11520
("RegressionMetrics should support instance weights")
but the pull request was closed as the changes were never checked in.

How was this patch tested?

I added tests to the metrics class.

SparkQA · 2017-02-27T20:57:34Z

Test build #73528 has finished for PR 17085 at commit 48800eb.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

imatiach-msft · 2017-02-27T22:40:08Z

@sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen would you be able to take a look? I've split the larger pull request into three parts as suggested.

imatiach-msft · 2017-03-16T04:52:21Z

ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? thank you!

SparkQA · 2018-04-16T16:54:11Z

Test build #89406 has finished for PR 17085 at commit d5acd46.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

imatiach-msft · 2018-04-16T19:03:35Z

Jenkins retest this please

SparkQA · 2018-04-16T23:28:55Z

Test build #89414 has finished for PR 17085 at commit 17c1626.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

imatiach-msft · 2018-12-11T05:27:20Z

ping @sethah @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? I've updated the PR to latest and made it similar to the multiclass PR that was merged: #17086

SparkQA · 2018-12-11T05:41:13Z

Test build #99947 has finished for PR 17085 at commit 0de3209.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-11T05:43:30Z

Test build #99948 has finished for PR 17085 at commit 0480721.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-11T08:05:01Z

Test build #99952 has finished for PR 17085 at commit 0cb2daf.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-11T08:05:01Z

Test build #99946 has finished for PR 17085 at commit aca6255.

This patch fails due to an unknown error code, -9.
This patch merges cleanly.
This patch adds no public classes.

srowen

It looks OK to me. What about the classification evaluators? is there the same meaningful notion of weights? I'd imagine it's possible but not sure I've seen weighted accuracy, etc.

srowen · 2018-12-11T14:06:28Z

mllib/src/main/scala/org/apache/spark/mllib/stat/MultivariateOnlineSummarizer.scala

@@ -52,7 +52,7 @@ class MultivariateOnlineSummarizer extends MultivariateStatisticalSummary with S
  private var totalCnt: Long = 0
  private var totalWeightSum: Double = 0.0
  private var weightSquareSum: Double = 0.0
-  private var weightSum: Array[Double] = _
+  private var currWeightSum: Array[Double] = _


Nit: I don't think the rename was necessary, but it is OK

Nevermind, it looks like the build failed because the private variable conflicts with the public variable that was defined:

/**

Sum of weights.
*/
override def weightSum: Double = totalWeightSum

I think this may be the best name for the public variable so I would prefer to keep it. The private variable now follows the naming convention of the other private array variables so I think this makes sense.

imatiach-msft · 2018-12-11T16:09:10Z

@srowen yes, exactly, there is a third PR here for classification: #17084
But I need to update it in a similar way to how I just updated this PR (eg 2.2.0 -> 3.0.0).

The original PR had all three but it was recommended that I break it up into 3 parts so I closed it and opened three separate PRs:
#16557

SparkQA · 2018-12-11T16:24:20Z

Test build #99982 has finished for PR 17085 at commit f708edb.

This patch fails to build.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2018-12-11T20:50:44Z

Test build #99984 has finished for PR 17085 at commit 24b66da.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen · 2018-12-12T16:06:46Z

Merged to master

…ed weight column for regression evaluator ## What changes were proposed in this pull request? The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data. I've closed the PR: apache#16557 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update. The updates to the regression metrics were based on (and updated with new changes based on comments): https://issues.apache.org/jira/browse/SPARK-11520 ("RegressionMetrics should support instance weights") but the pull request was closed as the changes were never checked in. ## How was this patch tested? I added tests to the metrics class. Closes apache#17085 from imatiach-msft/ilmat/regression-evaluate. Authored-by: Ilya Matiach <ilmat@microsoft.com> Signed-off-by: Sean Owen <sean.owen@databricks.com>

imatiach-msft mentioned this pull request Feb 27, 2017

[SPARK-18693][ML][MLLIB] ML Evaluators should use weight column #16557

Closed

imatiach-msft force-pushed the ilmat/regression-evaluate branch from 48800eb to d5acd46 Compare April 16, 2018 16:52

imatiach-msft force-pushed the ilmat/regression-evaluate branch from d5acd46 to 17c1626 Compare April 16, 2018 19:03

imatiach-msft changed the title ~~[SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator~~ [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator May 14, 2018

Added weight column for regression evaluator

aca6255

imatiach-msft force-pushed the ilmat/regression-evaluate branch from 17c1626 to aca6255 Compare December 11, 2018 05:12

srowen reviewed Dec 11, 2018

View reviewed changes

updated based on similar previous PR comments

f708edb

imatiach-msft force-pushed the ilmat/regression-evaluate branch from 0cb2daf to f708edb Compare December 11, 2018 16:11

renamed variable back

24b66da

srowen approved these changes Dec 11, 2018

View reviewed changes

srowen closed this in 570b8f3 Dec 12, 2018

imatiach-msft mentioned this pull request Mar 25, 2019

[SPARK-24102][ML][MLLIB][PYSPARK][FOLLOWUP] Added weight column to pyspark API for regression evaluator and metrics #24197

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator #17085

[SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator #17085

imatiach-msft commented Feb 27, 2017

SparkQA commented Feb 27, 2017

imatiach-msft commented Feb 27, 2017

imatiach-msft commented Mar 16, 2017

SparkQA commented Apr 16, 2018

imatiach-msft commented Apr 16, 2018

SparkQA commented Apr 16, 2018

imatiach-msft commented Dec 11, 2018

SparkQA commented Dec 11, 2018

SparkQA commented Dec 11, 2018

SparkQA commented Dec 11, 2018

SparkQA commented Dec 11, 2018

srowen left a comment

srowen Dec 11, 2018

imatiach-msft Dec 11, 2018

imatiach-msft Dec 11, 2018 •

edited

Loading

imatiach-msft commented Dec 11, 2018 •

edited

Loading

SparkQA commented Dec 11, 2018

SparkQA commented Dec 11, 2018

srowen commented Dec 12, 2018

[SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator #17085

[SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator #17085

Conversation

imatiach-msft commented Feb 27, 2017

What changes were proposed in this pull request?

How was this patch tested?

SparkQA commented Feb 27, 2017

imatiach-msft commented Feb 27, 2017

imatiach-msft commented Mar 16, 2017

SparkQA commented Apr 16, 2018

imatiach-msft commented Apr 16, 2018

SparkQA commented Apr 16, 2018

imatiach-msft commented Dec 11, 2018

SparkQA commented Dec 11, 2018

SparkQA commented Dec 11, 2018

SparkQA commented Dec 11, 2018

SparkQA commented Dec 11, 2018

srowen left a comment

Choose a reason for hiding this comment

srowen Dec 11, 2018

Choose a reason for hiding this comment

imatiach-msft Dec 11, 2018

Choose a reason for hiding this comment

imatiach-msft Dec 11, 2018 • edited Loading

Choose a reason for hiding this comment

imatiach-msft commented Dec 11, 2018 • edited Loading

SparkQA commented Dec 11, 2018

SparkQA commented Dec 11, 2018

srowen commented Dec 12, 2018

imatiach-msft Dec 11, 2018 •

edited

Loading

imatiach-msft commented Dec 11, 2018 •

edited

Loading