[SPARK-5099][Mllib] Simplify logistic loss function #3899

viirya · 2015-01-05T09:44:04Z

This is a minor pr where I think that we can simply take minus of margin, instead of subtracting margin.

Mathematically, they are equal. But the modified equation is the common form of logistic loss function and so more readable. It also computes more accurate value as some quick tests show.

srowen · 2015-01-05T09:56:19Z

+1 looks like a small good improvement.

SparkQA · 2015-01-05T10:57:37Z

Test build #25053 has finished for PR 3899 at commit 2bc5712.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-01-05T18:08:49Z

@viirya Thanks for the improvement! We could make this computation more accurate. We should branch based on sign(label * margin), similar to

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala#L64

Do you have time making this change? Or I can merge this PR and create a JIRA as a reminder.

viirya · 2015-01-05T23:19:41Z

@mengxr Thanks for that. I will do it later.

viirya · 2015-01-06T08:10:57Z

Hi @mengxr, Thanks for comment. I may be wrong, but I think that we should branch based on sign(label) instead of sign(label * margin)?

Because according to the definition of logistic loss function, log(1 + exp(-yp)), where y = +1 for positive label and y = -1 for negative label.

When we branch on sign(label * margin), we can not make sure what the sign of label is, right? Because when sign(label * margin) == 1, it could be (sign(label), sign(margin)) == (1, 1) or (sign(label), sign(margin)) == (-1, -1).

Besides, https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala#L67 and https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/tree/loss/LogLoss.scala#L69 are mathematically the same equation. I think it should be a bug. As the function note said, it is not used by the gradient boosting algorithm but only for debugging, so it is not found before. I add a commit to fix it.

SparkQA · 2015-01-06T09:20:57Z

Test build #25093 has finished for PR 3899 at commit a3f83ca.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

mengxr · 2015-01-06T09:23:30Z

@viirya They are the same analytically but not numerically, for example,

scala> math.log1p(math.exp(1000))
res2: Double = Infinity

scala> 1000 + math.log1p(math.exp(-1000))
res3: Double = 1000.0

viirya · 2015-01-06T10:02:18Z

OK. I understood it. I am wrong for LogLoss. Will revert it back.

viirya · 2015-01-06T10:15:44Z

@mengxr I noticed that you file a issue in #SPARK-5101. Do I need to extract the codes in this pr to the place you suggested mllib.util.MathFunctions?

SparkQA · 2015-01-06T11:22:13Z

Test build #25097 has finished for PR 3899 at commit 72a295e.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2015-01-06T13:23:08Z

@mengxr I think I already know what you meant to branch based on sign(label * margin). I made some modifications.

SparkQA · 2015-01-06T14:32:29Z

Test build #25101 has finished for PR 3899 at commit 0aa51e4.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

jkbradley · 2015-01-06T20:23:27Z

mllib/src/main/scala/org/apache/spark/mllib/optimization/Gradient.scala

@@ -64,11 +64,17 @@ class LogisticGradient extends Gradient {
    val gradientMultiplier = (1.0 / (1.0 + math.exp(margin))) - label
    val gradient = data.copy
    scal(gradientMultiplier, gradient)
+    val minusYP = label * margin


This looks like you expect labels to be -1, +1. They are actually 0, 1.

Oh? I thought it is similar to LogLoss so it would be {+1, -1}. Fixed.

mengxr · 2015-01-07T02:43:00Z

LGTM pending Jenkins. I'm going to merge this first. We might want to add more functions in #3915 .

SparkQA · 2015-01-07T03:21:28Z

Test build #25127 has finished for PR 3899 at commit 91a3860.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

viirya · 2015-01-07T05:12:50Z

Thanks.

mengxr · 2015-01-07T05:24:15Z

Merged into master. Thanks!

Simplify loss function.

2bc5712

Fix a bug.

a3f83ca

viirya changed the title ~~[Minor][Mllib] Simplify loss function~~ [SPARK-5099][Mllib] Simplify loss function Jan 6, 2015

viirya changed the title ~~[SPARK-5099][Mllib] Simplify loss function~~ [SPARK-5099][Mllib] Simplify logistic loss function and fix deviance loss function Jan 6, 2015

viirya changed the title ~~[SPARK-5099][Mllib] Simplify logistic loss function and fix deviance loss function~~ [SPARK-5099][Mllib] Simplify logistic loss function Jan 6, 2015

Revert LogLoss back and add more considerations in Logistic Loss.

72a295e

Further simplified.

0aa51e4

jkbradley reviewed Jan 6, 2015
View reviewed changes

Modified for comment.

91a3860

asfgit closed this in e21acc1 Jan 7, 2015

viirya deleted the logit_func branch December 27, 2023 18:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-5099][Mllib] Simplify logistic loss function #3899

[SPARK-5099][Mllib] Simplify logistic loss function #3899

viirya commented Jan 5, 2015

srowen commented Jan 5, 2015

SparkQA commented Jan 5, 2015

mengxr commented Jan 5, 2015

viirya commented Jan 5, 2015

viirya commented Jan 6, 2015

SparkQA commented Jan 6, 2015

mengxr commented Jan 6, 2015

viirya commented Jan 6, 2015

viirya commented Jan 6, 2015

SparkQA commented Jan 6, 2015

viirya commented Jan 6, 2015

SparkQA commented Jan 6, 2015

jkbradley Jan 6, 2015

viirya Jan 7, 2015

mengxr commented Jan 7, 2015

SparkQA commented Jan 7, 2015

viirya commented Jan 7, 2015

mengxr commented Jan 7, 2015

[SPARK-5099][Mllib] Simplify logistic loss function #3899

[SPARK-5099][Mllib] Simplify logistic loss function #3899

Conversation

viirya commented Jan 5, 2015

srowen commented Jan 5, 2015

SparkQA commented Jan 5, 2015

mengxr commented Jan 5, 2015

viirya commented Jan 5, 2015

viirya commented Jan 6, 2015

SparkQA commented Jan 6, 2015

mengxr commented Jan 6, 2015

viirya commented Jan 6, 2015

viirya commented Jan 6, 2015

SparkQA commented Jan 6, 2015

viirya commented Jan 6, 2015

SparkQA commented Jan 6, 2015

jkbradley Jan 6, 2015

Choose a reason for hiding this comment

viirya Jan 7, 2015

Choose a reason for hiding this comment

mengxr commented Jan 7, 2015

SparkQA commented Jan 7, 2015

viirya commented Jan 7, 2015

mengxr commented Jan 7, 2015