Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator #17084

Closed
wants to merge 12 commits into from

Conversation

imatiach-msft
Copy link
Contributor

What changes were proposed in this pull request?

The evaluators BinaryClassificationEvaluator, RegressionEvaluator, and MulticlassClassificationEvaluator and the corresponding metrics classes BinaryClassificationMetrics, RegressionMetrics and MulticlassMetrics should use sample weight data.

I've closed the PR: #16557
as recommended in favor of creating three pull requests, one for each of the evaluators (binary/regression/multiclass) to make it easier to review/update.

How was this patch tested?

I added tests to the metrics and evaluators classes.

@SparkQA
Copy link

SparkQA commented Feb 27, 2017

Test build #73526 has finished for PR 17084 at commit 98652cf.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"2.2.0\") (

@imatiach-msft
Copy link
Contributor Author

@sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen would you be able to take a look? I've split the larger pull request into three parts as suggested.

@imatiach-msft
Copy link
Contributor Author

ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? thank you!

@actuaryzhang
Copy link
Contributor

Thanks for the PR. I think this is helpful. Will take a look next week. Quite swamped recently.

@HyukjinKwon
Copy link
Member

gentle ping @actuaryzhang

@actuaryzhang
Copy link
Contributor

@imatiach-msft Thanks for the PR. Added a couple of comments. Sorry for the delayed review.

@HyukjinKwon
Copy link
Member

gentle ping @imatiach-msft .

@imatiach-msft
Copy link
Contributor Author

yes, will update the PR, thanks for the ping

@imatiach-msft
Copy link
Contributor Author

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Jun 26, 2017

Test build #78597 has finished for PR 17084 at commit cf59c62.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"2.2.0\") (

@SparkQA
Copy link

SparkQA commented Jun 26, 2017

Test build #78598 has finished for PR 17084 at commit 60fc2a7.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor Author

the pip packaging failing seems to be unrelated to the code... let me try this again

@imatiach-msft
Copy link
Contributor Author

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Jun 26, 2017

Test build #78606 has finished for PR 17084 at commit 60fc2a7.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor Author

gentle ping @actuaryzhang, thanks!

@imatiach-msft
Copy link
Contributor Author

Jenkins, test this please

@SparkQA
Copy link

SparkQA commented Apr 16, 2018

Test build #89405 has finished for PR 17084 at commit e89a030.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor Author

gently ping @sethah @Lewuathe @thunterdb @WeichenXu123 @jkbradley @actuaryzhang @srowen could you please take a look? I've merged with latest code. Thank you!

@imatiach-msft imatiach-msft changed the title [SPARK-18693][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator [SPARK-24103][ML][MLLIB] ML Evaluators should use weight column - added weight column for binary classification evaluator May 14, 2018
@SparkQA
Copy link

SparkQA commented Feb 5, 2019

Test build #102046 has finished for PR 17084 at commit 793a284.

  • This patch fails PySpark pip packaging tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"3.0.0\") (

@SparkQA
Copy link

SparkQA commented Feb 11, 2019

Test build #102177 has finished for PR 17084 at commit e78076b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationMetrics @Since(\"3.0.0\") (

Copy link
Contributor

@holdenk holdenk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick first pass review, thanks for working on this :)

@SparkQA
Copy link

SparkQA commented Feb 19, 2019

Test build #102510 has finished for PR 17084 at commit 955b022.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@imatiach-msft
Copy link
Contributor Author

@holdenk @srowen @actuaryzhang would you be able to take another look at this PR when you have a chance? I've updated to latest, fixed all failing tests and I think I have fixed all of the comments. Thank you!

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's looking OK; does Pyspark need an update, and does the multiclass evaluator need an update to match?

@imatiach-msft
Copy link
Contributor Author

@srowen multiclass weight columns PR is already merged:
#17086
The pyspark side will need to be updated. Should that be part of this PR or a separate PR?

@srowen
Copy link
Member

srowen commented Feb 22, 2019

Ah right, long since lost track. Yeah it would make sense to update it here too. I don't think R has an evaluator?

@SparkQA
Copy link

SparkQA commented Feb 22, 2019

Test build #102621 has finished for PR 17084 at commit 079e114.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 24, 2019

Test build #102720 has finished for PR 17084 at commit a571adc.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class BinaryClassificationEvaluator(JavaEvaluator, HasLabelCol, HasRawPredictionCol, HasWeightCol,

@SparkQA
Copy link

SparkQA commented Feb 25, 2019

Test build #102733 has finished for PR 17084 at commit d8a5865.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class MulticlassMetrics @Since(\"1.1.0\") (predictionAndLabels: RDD[_ <: Product])

@SparkQA
Copy link

SparkQA commented Feb 25, 2019

Test build #102734 has finished for PR 17084 at commit 00bfec1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
  • class MulticlassMetrics @Since(\"1.1.0\") (predictionAndLabels: RDD[_ <: Product])

@imatiach-msft
Copy link
Contributor Author

@srowen I think I have fixed all pending comments and the tests are currently passing for this PR. I couldn't find any evaluators in the R code, but then again I am not as familiar with sparkR (however, I've used sparklyr before https://github.com/rstudio/sparklyr). Please let me know if there are any other comments that need to be addressed. Thank you!

@srowen
Copy link
Member

srowen commented Feb 25, 2019

Merged to master

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants