Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-5099][Mllib] Simplify logistic loss function #3899

Closed
wants to merge 5 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Jan 5, 2015

This is a minor pr where I think that we can simply take minus of margin, instead of subtracting margin.

Mathematically, they are equal. But the modified equation is the common form of logistic loss function and so more readable. It also computes more accurate value as some quick tests show.

@srowen
Copy link
Member

srowen commented Jan 5, 2015

+1 looks like a small good improvement.

@SparkQA
Copy link

SparkQA commented Jan 5, 2015

Test build #25053 has finished for PR 3899 at commit 2bc5712.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Jan 5, 2015

@viirya Thanks for the improvement! We could make this computation more accurate. We should branch based on sign(label * margin), similar to

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala#L64

Do you have time making this change? Or I can merge this PR and create a JIRA as a reminder.

@viirya
Copy link
Member Author

viirya commented Jan 5, 2015

@mengxr Thanks for that. I will do it later.

@viirya
Copy link
Member Author

viirya commented Jan 6, 2015

Hi @mengxr, Thanks for comment. I may be wrong, but I think that we should branch based on sign(label) instead of sign(label * margin)?

Because according to the definition of logistic loss function, log(1 + exp(-yp)), where y = +1 for positive label and y = -1 for negative label.

When we branch on sign(label * margin), we can not make sure what the sign of label is, right? Because when sign(label * margin) == 1, it could be (sign(label), sign(margin)) == (1, 1) or (sign(label), sign(margin)) == (-1, -1).

Besides, https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala#L67 and https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala#L69 are mathematically the same equation. I think it should be a bug. As the function note said, it is not used by the gradient boosting algorithm but only for debugging, so it is not found before. I add a commit to fix it.

@viirya viirya changed the title [Minor][Mllib] Simplify loss function [SPARK-5099][Mllib] Simplify loss function Jan 6, 2015
@viirya viirya changed the title [SPARK-5099][Mllib] Simplify loss function [SPARK-5099][Mllib] Simplify logistic loss function and fix deviance loss function Jan 6, 2015
@SparkQA
Copy link

SparkQA commented Jan 6, 2015

Test build #25093 has finished for PR 3899 at commit a3f83ca.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Jan 6, 2015

@viirya They are the same analytically but not numerically, for example,

scala> math.log1p(math.exp(1000))
res2: Double = Infinity

scala> 1000 + math.log1p(math.exp(-1000))
res3: Double = 1000.0

@viirya
Copy link
Member Author

viirya commented Jan 6, 2015

OK. I understood it. I am wrong for LogLoss. Will revert it back.

@viirya viirya changed the title [SPARK-5099][Mllib] Simplify logistic loss function and fix deviance loss function [SPARK-5099][Mllib] Simplify logistic loss function Jan 6, 2015
@viirya
Copy link
Member Author

viirya commented Jan 6, 2015

@mengxr I noticed that you file a issue in #SPARK-5101. Do I need to extract the codes in this pr to the place you suggested mllib.util.MathFunctions?

@SparkQA
Copy link

SparkQA commented Jan 6, 2015

Test build #25097 has finished for PR 3899 at commit 72a295e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jan 6, 2015

@mengxr I think I already know what you meant to branch based on sign(label * margin). I made some modifications.

@SparkQA
Copy link

SparkQA commented Jan 6, 2015

Test build #25101 has finished for PR 3899 at commit 0aa51e4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@@ -64,11 +64,17 @@ class LogisticGradient extends Gradient {
val gradientMultiplier = (1.0 / (1.0 + math.exp(margin))) - label
val gradient = data.copy
scal(gradientMultiplier, gradient)
val minusYP = label * margin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like you expect labels to be -1, +1. They are actually 0, 1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh? I thought it is similar to LogLoss so it would be {+1, -1}. Fixed.

@mengxr
Copy link
Contributor

mengxr commented Jan 7, 2015

LGTM pending Jenkins. I'm going to merge this first. We might want to add more functions in #3915 .

@SparkQA
Copy link

SparkQA commented Jan 7, 2015

Test build #25127 has finished for PR 3899 at commit 91a3860.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Jan 7, 2015

Thanks.

@mengxr
Copy link
Contributor

mengxr commented Jan 7, 2015

Merged into master. Thanks!

@asfgit asfgit closed this in e21acc1 Jan 7, 2015
@viirya viirya deleted the logit_func branch December 27, 2023 18:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants