Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator #17085

Closed

Conversation

imatiach-msft
Copy link
Contributor

What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: #16557
as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

The updates to the regression metrics were based on (and updated with new changes based on comments):
https://issues.apache.org/jira/browse/SPARK-11520
("RegressionMetrics should support instance weights")
but the pull request was closed as the changes were never checked in.

How was this patch tested?

I added tests to the metrics class.

@SparkQA
Copy link

SparkQA commented Feb 27, 2017

Test build #73528 has finished for PR 17085 at commit 48800eb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor Author

@sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen would you be able to take a look? I've split the larger pull request into three parts as suggested.

@imatiach-msft
Copy link
Contributor Author

ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? thank you!

@SparkQA
Copy link

SparkQA commented Apr 16, 2018

Test build #89406 has finished for PR 17085 at commit d5acd46.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor Author

Jenkins retest this please

@SparkQA
Copy link

SparkQA commented Apr 16, 2018

Test build #89414 has finished for PR 17085 at commit 17c1626.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft imatiach-msft changed the title [SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator [SPARK-24102][ML][MLLIB] ML Evaluators should use weight column - added weight column for regression evaluator May 14, 2018
@imatiach-msft
Copy link
Contributor Author

ping @sethah @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? I've updated the PR to latest and made it similar to the multiclass PR that was merged: #17086

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99947 has finished for PR 17085 at commit 0de3209.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99948 has finished for PR 17085 at commit 0480721.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99952 has finished for PR 17085 at commit 0cb2daf.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99946 has finished for PR 17085 at commit aca6255.

  • This patch fails due to an unknown error code, -9.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks OK to me. What about the classification evaluators? is there the same meaningful notion of weights? I'd imagine it's possible but not sure I've seen weighted accuracy, etc.

@@ -52,7 +52,7 @@ class MultivariateOnlineSummarizer extends MultivariateStatisticalSummary with S
private var totalCnt: Long = 0
private var totalWeightSum: Double = 0.0
private var weightSquareSum: Double = 0.0
private var weightSum: Array[Double] = _
private var currWeightSum: Array[Double] = _
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I don't think the rename was necessary, but it is OK

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Contributor Author

@imatiach-msft imatiach-msft Dec 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, it looks like the build failed because the private variable conflicts with the public variable that was defined:

/**

  • Sum of weights.
    */
    override def weightSum: Double = totalWeightSum

I think this may be the best name for the public variable so I would prefer to keep it. The private variable now follows the naming convention of the other private array variables so I think this makes sense.

@imatiach-msft
Copy link
Contributor Author

imatiach-msft commented Dec 11, 2018

@srowen yes, exactly, there is a third PR here for classification: #17084
But I need to update it in a similar way to how I just updated this PR (eg 2.2.0 -> 3.0.0).

The original PR had all three but it was recommended that I break it up into 3 parts so I closed it and opened three separate PRs:
#16557

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99982 has finished for PR 17085 at commit f708edb.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 11, 2018

Test build #99984 has finished for PR 17085 at commit 24b66da.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Dec 12, 2018

Merged to master

@srowen srowen closed this in 570b8f3 Dec 12, 2018
holdenk pushed a commit to holdenk/spark that referenced this pull request Jan 5, 2019
…ed weight column for regression evaluator

## What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: apache#16557
 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

The updates to the regression metrics were based on (and updated with new changes based on comments):
https://issues.apache.org/jira/browse/SPARK-11520
 ("RegressionMetrics should support instance weights")
 but the pull request was closed as the changes were never checked in.

## How was this patch tested?

I added tests to the metrics class.

Closes apache#17085 from imatiach-msft/ilmat/regression-evaluate.

Authored-by: Ilya Matiach <ilmat@microsoft.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
jackylee-ch pushed a commit to jackylee-ch/spark that referenced this pull request Feb 18, 2019
…ed weight column for regression evaluator

## What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: apache#16557
 as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

The updates to the regression metrics were based on (and updated with new changes based on comments):
https://issues.apache.org/jira/browse/SPARK-11520
 ("RegressionMetrics should support instance weights")
 but the pull request was closed as the changes were never checked in.

## How was this patch tested?

I added tests to the metrics class.

Closes apache#17085 from imatiach-msft/ilmat/regression-evaluate.

Authored-by: Ilya Matiach <ilmat@microsoft.com>
Signed-off-by: Sean Owen <sean.owen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants