-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[R-package] Fix best_iter and best_score #2159
Conversation
manual tests done: * With early stopping + with validation set * With early stopping + without validation set * Without early stopping + with validation set * Without early stopping + without validation set And with multiple metrics / validation sets.
manual tests done: * With early stopping + with validation set * With early stopping + without validation set * Without early stopping + with validation set * Without early stopping + without validation set And with multiple metrics / validation sets.
There are some problems with metrics' order in Python: #2127. I suppose the same is true and for R too. |
@StrikerRUS In R lists, it is fixed. The first element always remains the first element. You may even have two elements having the same exact names without any conflict. However users may provide the same metric multiple times by mistake, in that case we deduplicate them. Example: library(lightgbm)
data(agaricus.train, package = "lightgbm")
train <- agaricus.train
dtrain <- lgb.Dataset(train$data, label = train$label)
data(agaricus.test, package = "lightgbm")
test <- agaricus.test
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label = test$label)
params <- list(objective = "regression", metric = c("l2", "l1", "l2"))
valids <- list(test = dtest)
model <- lgb.train(params,
dtrain,
100,
valids,
min_data = 1,
learning_rate = 0.5)
str(model$record_evals$test, max.level = 1)
# > str(model$record_evals$test, max.level = 1)
# List of 2
# $ l2:List of 2
# $ l1:List of 2 |
@Laurae2 will it be better to have a parameter named "first_metric_only" in R as well? |
BTW, if both R and python have "first_metric_only" option, I think we should have the same option in CLI version. |
@guolinke Yes, I think we should have As the handling would be different for each wrapper, R and Python would have their own implementations using callback. |
@StrikerRUS do you know why Travis MPI/Python jobs are failing? |
@Laurae2 is this implemented in this PR? |
also refer to this: |
@guolinke This PR uses all metrics. We can do Note that the best score / iteration is taken from the first metric when it was not computed by the model. |
Closing/reopening for CI |
This doesn't work for branch's CIs, it works only for PR's CIs. |
@jameslamb do we merge it as is for the moment? |
I've updated the branch. Now all checks should be OK. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thank you @Laurae2 , apologies for my delayed review
This should fix #2158 and #2029.
"best" rule:
Later, we should enforce the metric used for early stopping should be only the first one, or at worst give the user the ability to choose the metric (best is the first).
note: @jameslamb spaces from RStudio to fix
Example (change to
metric = "auc"
andmax_depth = 3
to test maximization):