[Docs] Any documentation for boost_from_average? #352

Laurae2 · 2017-03-23T11:59:07Z

I'm just wondering if there is any documentation for boost_from_average (ref commit e179c7c).

I'm seeing this specific change:

  if (models_.empty() && gbdt_config_->boost_from_average && !train_score_updater_-
>has_init_score()) {
    std::vector<double> sum_per_class(num_class_, 0.0f);
    auto label = train_data_->metadata().label();
    if (num_class_ > 1) {
      for (data_size_t i = 0; i < num_data_; ++i) {
        sum_per_class[static_cast<int>(label[i])] += 1.0f;
      }
    } else {
      for (data_size_t i = 0; i < num_data_; ++i) {
        sum_per_class[0] += label[i];
      }
    }
    for (int curr_class = 0; curr_class < num_class_; ++curr_class) {
      double init_score = sum_per_class[curr_class] / num_data_;
      std::unique_ptr<Tree> new_tree(new Tree(2));
      new_tree->Split(0, 0, BinType::NumericalBin, 0, 0, 0, init_score, init_score, 0, num_data_, 1);
      train_score_updater_->AddScore(init_score, curr_class);
      for (auto& score_updater : valid_score_updater_) {
        score_updater->AddScore(init_score, curr_class);
      }
      models_.push_back(std::move(new_tree));
    }
    boost_from_average_ = true;
  }

However, when trying it, I'm seeing no difference and trees are identical (or, I'm not looking at what I should look for).

Test:

library(lightgbm)
data(agaricus.train, package='lightgbm')
train <- agaricus.train
train$label[1:1000] <- 2
dtrain <- lgb.Dataset(train$data, label=train$label)
data(agaricus.test, package='lightgbm')
test <- agaricus.test
test$label[1:400] <- 2
dtest <- lgb.Dataset.create.valid(dtrain, test$data, label=test$label)
valids <- list(test=dtest)

[5]:	test's l2:1.28416
params <- list(objective="multiclass", num_class=3, metric="l2", boost_from_average = FALSE)
model <- lgb.train(params, dtrain, 5, valids, min_data=1, learning_rate=0.05, early_stopping_rounds=5)

[5]:	test's l2:1.28416
params <- list(objective="multiclass", num_class=3, metric="l2", boost_from_average = TRUE)
model <- lgb.train(params, dtrain, 5, valids, min_data=1, learning_rate=0.05, early_stopping_rounds=5)

The text was updated successfully, but these errors were encountered:

guolinke · 2017-03-23T12:16:45Z

@Laurae2
I think the multi-class task cannot produce the correct L2 metrics.

Laurae2 · 2017-03-23T12:29:53Z

@guolinke The one below is with multi_logloss, identical results. Is this an expected behavior? (I might not have understood what it does: starting with a specific score which is the average of each class / sample count?)

> params <- list(objective="multiclass", num_class=3, metric="multi_logloss", boost_from_average = TRUE)
> valids <- list(test=dtest)
> model <- lgb.train(params, dtrain, 5, valids, min_data=1, learning_rate=0.05, early_stopping_rounds=5)
[LightGBM] [Info] Number of data: 6513, number of features: 116
[1]:	test's multi_logloss:1.03786 
[2]:	test's multi_logloss:0.982533 
[3]:	test's multi_logloss:0.932025 
[4]:	test's multi_logloss:0.885889 
[5]:	test's multi_logloss:0.843413 

> params <- list(objective="multiclass", num_class=3, metric="multi_logloss", boost_from_average = FALSE)
> model <- lgb.train(params, dtrain, 5, valids, min_data=1, learning_rate=0.05, early_stopping_rounds=5)
[LightGBM] [Info] Number of data: 6513, number of features: 116
[1]:	test's multi_logloss:1.03786 
[2]:	test's multi_logloss:0.982533 
[3]:	test's multi_logloss:0.932025 
[4]:	test's multi_logloss:0.885889 
[5]:	test's multi_logloss:0.843413

wxchan · 2017-03-23T13:45:25Z

I think it may depend on your dataset.

with the data in .../examples/multiclass_classification
boost_from_average = true

[LightGBM] [Info] Iteration:10, training multi_logloss : 1.50462
[LightGBM] [Info] Iteration:10, valid_1 multi_logloss : 1.55805
[LightGBM] [Info] 0.609126 seconds elapsed, finished iteration 10

boost_from_average = false

[LightGBM] [Info] Iteration:10, training multi_logloss : 1.50457
[LightGBM] [Info] Iteration:10, valid_1 multi_logloss : 1.55707
[LightGBM] [Info] 0.574495 seconds elapsed, finished iteration 10

Laurae2 · 2017-03-24T13:21:25Z

@wxchan I tested it with another set and got different results now like you explained.

changed port and added example

github-actions · 2023-08-24T02:40:43Z

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

Laurae2 changed the title ~~Any documentation for boost_from_average?~~ [Docs] Any documentation for boost_from_average? Mar 23, 2017

Laurae2 closed this as completed Mar 24, 2017

Laurae2 mentioned this issue Mar 24, 2017

Add boost_from_average to docs #354

Merged

eisber pushed a commit to eisber/LightGBM that referenced this issue Mar 15, 2019

changed port and added example (microsoft#352)

bdf354f

changed port and added example

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] Any documentation for boost_from_average? #352

[Docs] Any documentation for boost_from_average? #352

Laurae2 commented Mar 23, 2017

guolinke commented Mar 23, 2017

Laurae2 commented Mar 23, 2017

wxchan commented Mar 23, 2017

Laurae2 commented Mar 24, 2017

github-actions bot commented Aug 24, 2023

[Docs] Any documentation for boost_from_average? #352

[Docs] Any documentation for boost_from_average? #352

Comments

Laurae2 commented Mar 23, 2017

guolinke commented Mar 23, 2017

Laurae2 commented Mar 23, 2017

wxchan commented Mar 23, 2017

Laurae2 commented Mar 24, 2017

github-actions bot commented Aug 24, 2023