Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] LightGBM results change between int/num #3094

Closed
Laurae2 opened this issue May 16, 2020 · 3 comments
Closed

[R-package] LightGBM results change between int/num #3094

Laurae2 opened this issue May 16, 2020 · 3 comments
Assignees

Comments

@Laurae2
Copy link
Contributor

Laurae2 commented May 16, 2020

How you are using LightGBM?

LightGBM component: R-package

Environment info

(...)

Other: applies to any R version / compiler combination

LightGBM version or commit hash: 0e3509c

Error message and / or logs

Training a model with integer labels seem to provide wrong results and/or change LightGBM behavior.

Training matrix :

  • row 1: variables: 0, 1 => label 1
  • row 2: variables: 1, 0 => label 0

Expected prediction (should predict "1 0"):

  • row 1: predict 1
  • row 2: predict 0

Results:

Matrix Type / Labels Type Integer Numeric
Integer 0 0 (KO) 0 1 (KO)
Numeric 0 0 (KO) 0 1 (OK)

Increased the number of iterations but no better results.

Changing labels to (1, 2) instead of (0, 1) leads to:

  • OK case: predicting (1, 2) instead of (0, 1) (still correct)
  • KO case: predicting (1, 1) instead of (0, 1) (still wrong)

Full logs:

> # Data Int / Labels Int
> train_mat <- matrix(c(0L, 1L, 1L, 0L), nrow = 2, ncol = 2)
> train_labels <- c(0L, 1L)
> dtrain <- lgb.Dataset(train_mat, label = train_labels)
> model <- lgb.train(
+   params = list(objective = "regression", metric = "l2")
+   , data = dtrain
+   , nrounds = 1L
+   , min_data = 1L
+   , learning_rate = 1.0
+   , verbose = -1
+ )
> round(predict(model, train_mat), digits = 10)
[1] 0 0
> round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10)
[1] 1
> 
> # Data Num / Labels Int
> train_mat <- matrix(c(0, 1, 1, 0), nrow = 2, ncol = 2)
> train_labels <- c(0L, 1L)
> dtrain <- lgb.Dataset(train_mat, label = train_labels)
> model <- lgb.train(
+   params = list(objective = "regression", metric = "l2")
+   , data = dtrain
+   , nrounds = 1L
+   , min_data = 1L
+   , learning_rate = 1.0
+   , verbose = -1
+ )
> round(predict(model, train_mat), digits = 10)
[1] 0 1
> round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10)
[1] 0
> 
> # Data Int / Labels Num
> train_mat <- matrix(c(0L, 1L, 1L, 0L), nrow = 2, ncol = 2)
> train_labels <- c(0, 1)
> dtrain <- lgb.Dataset(train_mat, label = train_labels)
> model <- lgb.train(
+   params = list(objective = "regression", metric = "l2")
+   , data = dtrain
+   , nrounds = 1L
+   , min_data = 1L
+   , learning_rate = 1.0
+   , verbose = -1
+ )
> round(predict(model, train_mat), digits = 10)
[1] 0 0
> round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10)
[1] 1
> 
> # Data Num / Labels Num
> train_mat <- matrix(c(0, 1, 1, 0), nrow = 2, ncol = 2)
> train_labels <- c(0, 1)
> dtrain <- lgb.Dataset(train_mat, label = train_labels)
> model <- lgb.train(
+   params = list(objective = "regression", metric = "l2")
+   , data = dtrain
+   , nrounds = 1L
+   , min_data = 1L
+   , learning_rate = 1.0
+   , verbose = -1
+ )
> round(predict(model, train_mat), digits = 10)
[1] 0 1
> round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10)
[1] 0

Reproducible example(s)

library(lightgbm)

# Data Int / Labels Int
train_mat <- matrix(c(0L, 1L, 1L, 0L), nrow = 2, ncol = 2)
train_labels <- c(0L, 1L)
dtrain <- lgb.Dataset(train_mat, label = train_labels)
model <- lgb.train(
  params = list(objective = "regression", metric = "l2")
  , data = dtrain
  , nrounds = 1L
  , min_data = 1L
  , learning_rate = 1.0
  , verbose = -1
)
round(predict(model, train_mat), digits = 10) # Must be 0, 1
round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10) # Must be 0

# Data Num / Labels Int
train_mat <- matrix(c(0, 1, 1, 0), nrow = 2, ncol = 2)
train_labels <- c(0L, 1L)
dtrain <- lgb.Dataset(train_mat, label = train_labels)
model <- lgb.train(
  params = list(objective = "regression", metric = "l2")
  , data = dtrain
  , nrounds = 1L
  , min_data = 1L
  , learning_rate = 1.0
  , verbose = -1
)
round(predict(model, train_mat), digits = 10) # Must be 0, 1
round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10) # Must be 0

# Data Int / Labels Num
train_mat <- matrix(c(0L, 1L, 1L, 0L), nrow = 2, ncol = 2)
train_labels <- c(0, 1)
dtrain <- lgb.Dataset(train_mat, label = train_labels)
model <- lgb.train(
  params = list(objective = "regression", metric = "l2")
  , data = dtrain
  , nrounds = 1L
  , min_data = 1L
  , learning_rate = 1.0
  , verbose = -1
)
round(predict(model, train_mat), digits = 10) # Must be 0, 1
round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10) # Must be 0

# Data Num / Labels Num
train_mat <- matrix(c(0, 1, 1, 0), nrow = 2, ncol = 2)
train_labels <- c(0, 1)
dtrain <- lgb.Dataset(train_mat, label = train_labels)
model <- lgb.train(
  params = list(objective = "regression", metric = "l2")
  , data = dtrain
  , nrounds = 1L
  , min_data = 1L
  , learning_rate = 1.0
  , verbose = -1
)
round(predict(model, train_mat), digits = 10) # Must be 0, 1
round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10) # Must be 0

Steps to reproduce

Run the following code in R:

library(lightgbm)

# Data Int / Labels Int
train_mat <- matrix(c(0L, 1L, 1L, 0L), nrow = 2, ncol = 2)
train_labels <- c(0L, 1L)
dtrain <- lgb.Dataset(train_mat, label = train_labels)
model <- lgb.train(
  params = list(objective = "regression", metric = "l2")
  , data = dtrain
  , nrounds = 1L
  , min_data = 1L
  , learning_rate = 1.0
  , verbose = -1
)
round(predict(model, train_mat), digits = 10) # Must be 0, 1
round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10) # Must be 0

# Data Num / Labels Int
train_mat <- matrix(c(0, 1, 1, 0), nrow = 2, ncol = 2)
train_labels <- c(0L, 1L)
dtrain <- lgb.Dataset(train_mat, label = train_labels)
model <- lgb.train(
  params = list(objective = "regression", metric = "l2")
  , data = dtrain
  , nrounds = 1L
  , min_data = 1L
  , learning_rate = 1.0
  , verbose = -1
)
round(predict(model, train_mat), digits = 10) # Must be 0, 1
round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10) # Must be 0

# Data Int / Labels Num
train_mat <- matrix(c(0L, 1L, 1L, 0L), nrow = 2, ncol = 2)
train_labels <- c(0, 1)
dtrain <- lgb.Dataset(train_mat, label = train_labels)
model <- lgb.train(
  params = list(objective = "regression", metric = "l2")
  , data = dtrain
  , nrounds = 1L
  , min_data = 1L
  , learning_rate = 1.0
  , verbose = -1
)
round(predict(model, train_mat), digits = 10) # Must be 0, 1
round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10) # Must be 0

# Data Num / Labels Num
train_mat <- matrix(c(0, 1, 1, 0), nrow = 2, ncol = 2)
train_labels <- c(0, 1)
dtrain <- lgb.Dataset(train_mat, label = train_labels)
model <- lgb.train(
  params = list(objective = "regression", metric = "l2")
  , data = dtrain
  , nrounds = 1L
  , min_data = 1L
  , learning_rate = 1.0
  , verbose = -1
)
round(predict(model, train_mat), digits = 10) # Must be 0, 1
round(sum(abs(predict(model, train_mat) - train_labels)), digits = 10) # Must be 0
@jameslamb
Copy link
Collaborator

Wow thank you for the detailed write-up! I will look into this.

@jameslamb
Copy link
Collaborator

closed by #3140 , thanks to @mayer79

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants