Pr3 monotone constraints splits penalization #2939

CharlesAuguste · 2020-03-24T15:03:31Z

@guolinke @jameslamb @StrikerRUS
You were all reviewers of the PRs #2305, #2770 and #2717. #2305 was judged to be not merge-able because it is too big. Therefore, in my ultimate comment (#2305 (comment)) I said I would split into smaller PRs easier to merge. This is the third PR in relation with #2305; the second one #2770 and the first one the first one #2717 and having been merged already.

The goal of this PR is to introduce a penalization of split gains when there a monotone constraint is present. This penalization depends on the depth of the node. The reason behind this is that trees are being built greedily. However, imposing a strong constraint at the top of the tree on all the children can significantly reduce the splitting options later on. Therefore, we most likely don't want monotone splits happening at the top of the trees. More details are available in the original report https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf. This allows for a significantly better loss when using monotone constraints.

Feel free to ask for more details! Thanks,

CharlesAuguste · 2020-03-24T15:04:11Z

src/io/config_auto.cpp

@@ -419,6 +421,12 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str

  GetString(params, "monotone_constraints_method", &monotone_constraints_method);

+  GetDouble(params, "monotone_penalty", &monotone_penalty);
+  CHECK_GE(monotone_penalty, 0.0);
+  if (max_depth > 0) {  //FIXME Not specified in config.h because I don't know how to specify that, please advise


Please advise on this.

Move thin into config.cpp, the check conflict function

CharlesAuguste · 2020-03-24T15:04:37Z

src/treelearner/serial_tree_learner.cpp

@@ -692,12 +692,17 @@ void SerialTreeLearner::ComputeBestSplitForFeature(
        cegb_->DetlaGain(feature_index, real_fidx, leaf_splits->leaf_index(),
                         num_data, new_split);
  }
+  if (new_split.monotone_type != 0) {
+    double penalty = LeafConstraintsBase::ComputeMonotoneSplitGainPenalty(
+        tree->leaf_depth(leaf_splits->leaf_index()), config_->monotone_penalty); // FIXME The tree has been passed to all the functions just to be used here. You may not like that. Please advise for a better solution, for example storing depths in the constraints.


Please advise on this.

I think you can pass the pointer of tree into MC at the beginning of treelearner::train

guolinke · 2020-03-25T13:25:07Z

src/treelearner/monotone_constraints.hpp

+  }
+
+  private:
+    Tree* tree;


Suggested change

Tree* tree;

const Tree* tree_;

guolinke · 2020-03-25T13:28:29Z

src/treelearner/monotone_constraints.hpp

+    return 1. - pow(2, penalization - 1. - depth) + epsilon;
+  }
+
+  void ShareTreePointer(Tree* tree) {


@guolinke I am not sure what you mean by that, since I am copying the pointer, I don't understand how I can having constant.

If I just write void ShareTreePointer(const Tree* tree), then I am getting errors:
invalid conversion from ‘const LightGBM::Tree*’ to ‘LightGBM::Tree*’

Never mind, I didn't see the change above.

guolinke · 2020-03-25T13:29:57Z

src/treelearner/monotone_constraints.hpp

+                                                double epsilon = 1e-10) {
+    int depth = tree->leaf_depth(leaf_index);
+    if (penalization >= depth + 1.) {
+      return epsilon;


maybe you can use kEpsilon from meta.h

src/treelearner/monotone_constraints.hpp

CharlesAuguste · 2020-03-25T14:46:19Z

@guolinke I answered all your comments. I am not sure why travis is still pending, because all checks seem green and it has been pending for a while now. Let me know what you think of this PR. Thanks

CharlesAuguste · 2020-04-06T13:44:18Z

Hi @guolinke, is there anything else you want to do in this PR? Thanks

StrikerRUS

@CharlesAuguste Please check some minor comments from me below.

include/LightGBM/config.h

tests/python_package_test/test_engine.py

CharlesAuguste · 2020-04-07T08:45:55Z

@StrikerRUS I answered your comments above. However some checks are not passing. Can you advise on the errors? Thanks

! LaTeX Error: File `inconsolata.sty' not found.
There was an error in the .travis.yml file from which we could not recover. Unfortunately, we do not know much about this error.

EDIT: Actually I see that this is related to one of your latest PRs. I will wait for this to be fixed. Thanks

StrikerRUS · 2020-04-07T13:38:48Z

@CharlesAuguste

EDIT: Actually I see that this is related to one of your latest PRs. I will wait for this to be fixed. Thanks

Yeah, our docs and latex things stopped working yesterday. I just merged workarounds in master, please pull them to make everything work again.

…SplitForFeature.

…puteBestSplitForFeature." This reverts commit 37757e8.

This reverts commit e49eeee.

Co-Authored-By: Guolin Ke <guolin.ke@outlook.com>

CharlesAuguste · 2020-04-07T15:54:45Z

@CharlesAuguste

EDIT: Actually I see that this is related to one of your latest PRs. I will wait for this to be fixed. Thanks

Yeah, our docs and latex things stopped working yesterday. I just merged workarounds in master, please pull them to make everything work again.

@StrikerRUS Yes seems to be working fine now! Let me know if there is anything else you would like me to change. Thanks

StrikerRUS

LGTM from my point of view, except two minor comments below.
Thanks a lot for this PR!

include/LightGBM/config.h

StrikerRUS · 2020-04-07T16:10:41Z

tests/python_package_test/test_engine.py

+            constrained_model = lgb.train(params_constrained_model, trainset_constrained_model, 10)
+
+            # Check that a very high penalization is the same as not using the features at all
+            np.testing.assert_array_equal(constrained_model.predict(x), unconstrained_model_predictions)


There is a special function for floats comparison. Or is it a case when we need exact equality like here?

LightGBM/tests/python_package_test/test_basic.py

Lines 60 to 61 in ce08242

# we need to check the consistency of model file here, so test for exact equal

np.testing.assert_array_equal(pred_from_matr, pred_from_model_file)

Suggested change

np.testing.assert_array_equal(constrained_model.predict(x), unconstrained_model_predictions)

np.testing.assert_allclose(constrained_model.predict(x), unconstrained_model_predictions)

I am actually not sure here. I compare a model with 1 variable (the unconstrained one) to a model with 3 variables, with 2 of the 3 that should not be used at all because of penalization (this is the constrained model here). So in the end, the 2 resulting models should output exactly the same results I think, because a single variable is used in both to make splits. So I would say that assert_array_equal is what we want, but I am not entirely sure. Given my explanation which do you think is best?

I agree with that one-feature model should output the same predictions. But I have a doubt that we won't have problems due to internal grad/hess float summation. However, I'm OK with the current code while CI is producing green status. At least, we are now aware about the place where to look if problems appear.

Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

guolinke

looks good to me

CharlesAuguste · 2020-04-09T16:40:29Z

Thanks for merging. Just one more PR to go now! The final one may a bit trickier, I will try to open it in a few days. Thanks

* Add the monotone penalty parameter to the config. * Pass tree in the necessary functions so it can be used in ComputeBestSplitForFeature. * Add monotone penalty. * Added link to the original report. * Add tests. * Fix GPU. * Revert "Pass tree in the necessary functions so it can be used in ComputeBestSplitForFeature." This reverts commit 37757e8. * Revert "Fix GPU." This reverts commit e49eeee. * Added a shared pointer to the tree so the constraints can use it too. * Moved check on monotone penalty to config.cpp. * Python linting. * Use AssertTrue instead of assert_. * Fix penalization in test. * Make GPU deterministic in tests. * Rename tree to tree_ in monotone constraints. * Replaced epsilon by kEplison. * Typo. * Make tree pointer const. * Update src/treelearner/monotone_constraints.hpp Co-Authored-By: Guolin Ke <guolin.ke@outlook.com> * Update src/treelearner/monotone_constraints.hpp Co-Authored-By: Guolin Ke <guolin.ke@outlook.com> * Added alias for the penalty. * Remove useless comment. * Save CI time. * Refactor test_monotone_penalty_max. * Update include/LightGBM/config.h Co-Authored-By: Nikita Titov <nekit94-08@mail.ru> * Fix doc to be in line with previous config change commit. Co-authored-by: Charles Auguste <auguste@dubquantdev801.ire.susq.com> Co-authored-by: Guolin Ke <guolin.ke@outlook.com> Co-authored-by: Nikita Titov <nekit94-08@mail.ru>

StrikerRUS · 2020-06-11T18:54:48Z

Hello @CharlesAuguste !

We want to include new constraints methods in 3.0.0 release (we don't have any particular release date though). So just want to know how is preparation for the last remaining PR going.

CharlesAuguste · 2020-06-12T17:40:10Z

Hello @CharlesAuguste !

We want to include new constraints methods in 3.0.0 release (we don't have any particular release date though). So just want to know how is preparation for the last remaining PR going.

Hi @StrikerRUS, sorry for the delay. I saw the conversation on the release PR, and have been working on this. I should hopefully be able to open a PR in the coming days. I will keep you updated. Thanks

github-actions · 2023-08-24T13:06:08Z

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

CharlesAuguste requested review from chivee, guolinke, jameslamb, Laurae2 and StrikerRUS as code owners March 24, 2020 15:03

CharlesAuguste commented Mar 24, 2020

View reviewed changes

CharlesAuguste requested a review from huanzhang12 as a code owner March 24, 2020 15:42

StrikerRUS added the feature label Mar 24, 2020

guolinke reviewed Mar 25, 2020

View reviewed changes

src/treelearner/monotone_constraints.hpp Outdated Show resolved Hide resolved

guolinke reviewed Mar 25, 2020

View reviewed changes

src/treelearner/monotone_constraints.hpp Outdated Show resolved Hide resolved

StrikerRUS reviewed Apr 6, 2020

View reviewed changes

Charles Auguste added 12 commits April 7, 2020 15:39

Add the monotone penalty parameter to the config.

830c9bf

Pass tree in the necessary functions so it can be used in ComputeBest…

6b0f7bd

…SplitForFeature.

Add monotone penalty.

1b98464

Added link to the original report.

e662261

Add tests.

4dc2c7a

Fix GPU.

ea3aab9

Revert "Pass tree in the necessary functions so it can be used in Com…

c79c10c

…puteBestSplitForFeature." This reverts commit 37757e8.

Revert "Fix GPU."

6e808cb

This reverts commit e49eeee.

Added a shared pointer to the tree so the constraints can use it too.

75a0b9a

Moved check on monotone penalty to config.cpp.

b976970

Python linting.

1c1639f

Use AssertTrue instead of assert_.

a95188f

Charles Auguste and others added 12 commits April 7, 2020 15:39

Fix penalization in test.

0e3fd27

Make GPU deterministic in tests.

aeda36b

Rename tree to tree_ in monotone constraints.

ae0106f

Replaced epsilon by kEplison.

8bc4cfd

Typo.

7a30d2f

Make tree pointer const.

0bd18c2

Update src/treelearner/monotone_constraints.hpp

afbcda6

Co-Authored-By: Guolin Ke <guolin.ke@outlook.com>

Update src/treelearner/monotone_constraints.hpp

bd7b037

Co-Authored-By: Guolin Ke <guolin.ke@outlook.com>

Added alias for the penalty.

9e28547

Remove useless comment.

3a05420

Save CI time.

36acccb

Refactor test_monotone_penalty_max.

3ab294c

CharlesAuguste force-pushed the PR3-monotone-constraints-splits-penalization branch from 053d93e to 3ab294c Compare April 7, 2020 15:13

StrikerRUS approved these changes Apr 7, 2020

View reviewed changes

CharlesAuguste and others added 2 commits April 7, 2020 17:21

Update include/LightGBM/config.h

fada0c6

Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

Fix doc to be in line with previous config change commit.

782abe8

StrikerRUS requested a review from guolinke April 8, 2020 12:36

guolinke approved these changes Apr 9, 2020

View reviewed changes

StrikerRUS merged commit 505a145 into microsoft:master Apr 9, 2020

CharlesAuguste mentioned this pull request Jul 29, 2020

Pr4 advanced method monotone constraints #3264

Merged

JoshuaC3 mentioned this pull request Jul 2, 2021

A better method to enforce monotonic constraints in regression and classification trees dmlc/xgboost#7073

Open

github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pr3 monotone constraints splits penalization #2939

Pr3 monotone constraints splits penalization #2939

CharlesAuguste commented Mar 24, 2020

CharlesAuguste Mar 24, 2020

guolinke Mar 24, 2020

CharlesAuguste Mar 24, 2020

guolinke Mar 24, 2020

guolinke Mar 25, 2020 •

edited

Loading

guolinke Mar 25, 2020

CharlesAuguste Mar 25, 2020 •

edited

Loading

CharlesAuguste Mar 25, 2020

guolinke Mar 25, 2020

CharlesAuguste commented Mar 25, 2020

CharlesAuguste commented Apr 6, 2020

StrikerRUS left a comment

CharlesAuguste commented Apr 7, 2020 •

edited

Loading

StrikerRUS commented Apr 7, 2020

CharlesAuguste commented Apr 7, 2020

StrikerRUS left a comment

StrikerRUS Apr 7, 2020

CharlesAuguste Apr 7, 2020

StrikerRUS Apr 7, 2020

guolinke left a comment

CharlesAuguste commented Apr 9, 2020

StrikerRUS commented Jun 11, 2020

CharlesAuguste commented Jun 12, 2020

github-actions bot commented Aug 24, 2023

	# we need to check the consistency of model file here, so test for exact equal
	np.testing.assert_array_equal(pred_from_matr, pred_from_model_file)

	np.testing.assert_array_equal(constrained_model.predict(x), unconstrained_model_predictions)
	np.testing.assert_allclose(constrained_model.predict(x), unconstrained_model_predictions)

Pr3 monotone constraints splits penalization #2939

Pr3 monotone constraints splits penalization #2939

Conversation

CharlesAuguste commented Mar 24, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guolinke Mar 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CharlesAuguste Mar 25, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CharlesAuguste commented Mar 25, 2020

CharlesAuguste commented Apr 6, 2020

StrikerRUS left a comment

Choose a reason for hiding this comment

CharlesAuguste commented Apr 7, 2020 • edited Loading

StrikerRUS commented Apr 7, 2020

CharlesAuguste commented Apr 7, 2020

StrikerRUS left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guolinke left a comment

Choose a reason for hiding this comment

CharlesAuguste commented Apr 9, 2020

StrikerRUS commented Jun 11, 2020

CharlesAuguste commented Jun 12, 2020

github-actions bot commented Aug 24, 2023

guolinke Mar 25, 2020 •

edited

Loading

CharlesAuguste Mar 25, 2020 •

edited

Loading

CharlesAuguste commented Apr 7, 2020 •

edited

Loading