Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pr3 monotone constraints splits penalization #2939

Conversation

CharlesAuguste
Copy link
Contributor

@aldanor @redditur

@guolinke @jameslamb @StrikerRUS
You were all reviewers of the PRs #2305, #2770 and #2717. #2305 was judged to be not merge-able because it is too big. Therefore, in my ultimate comment (#2305 (comment)) I said I would split into smaller PRs easier to merge. This is the third PR in relation with #2305; the second one #2770 and the first one the first one #2717 and having been merged already.

The goal of this PR is to introduce a penalization of split gains when there a monotone constraint is present. This penalization depends on the depth of the node. The reason behind this is that trees are being built greedily. However, imposing a strong constraint at the top of the tree on all the children can significantly reduce the splitting options later on. Therefore, we most likely don't want monotone splits happening at the top of the trees. More details are available in the original report https://github.com/microsoft/LightGBM/files/3457826/PR-monotone-constraints-report.pdf. This allows for a significantly better loss when using monotone constraints.

Feel free to ask for more details! Thanks,

@@ -419,6 +421,12 @@ void Config::GetMembersFromString(const std::unordered_map<std::string, std::str

GetString(params, "monotone_constraints_method", &monotone_constraints_method);

GetDouble(params, "monotone_penalty", &monotone_penalty);
CHECK_GE(monotone_penalty, 0.0);
if (max_depth > 0) { //FIXME Not specified in config.h because I don't know how to specify that, please advise
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please advise on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move thin into config.cpp, the check conflict function

@@ -692,12 +692,17 @@ void SerialTreeLearner::ComputeBestSplitForFeature(
cegb_->DetlaGain(feature_index, real_fidx, leaf_splits->leaf_index(),
num_data, new_split);
}
if (new_split.monotone_type != 0) {
double penalty = LeafConstraintsBase::ComputeMonotoneSplitGainPenalty(
tree->leaf_depth(leaf_splits->leaf_index()), config_->monotone_penalty); // FIXME The tree has been passed to all the functions just to be used here. You may not like that. Please advise for a better solution, for example storing depths in the constraints.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please advise on this.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can pass the pointer of tree into MC at the beginning of treelearner::train

}

private:
Tree* tree;
Copy link
Collaborator

@guolinke guolinke Mar 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Tree* tree;
const Tree* tree_;

return 1. - pow(2, penalization - 1. - depth) + epsilon;
}

void ShareTreePointer(Tree* tree) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Const T*

Copy link
Contributor Author

@CharlesAuguste CharlesAuguste Mar 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@guolinke I am not sure what you mean by that, since I am copying the pointer, I don't understand how I can having constant.

If I just write void ShareTreePointer(const Tree* tree), then I am getting errors:
invalid conversion from ‘const LightGBM::Tree*’ to ‘LightGBM::Tree*’

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind, I didn't see the change above.

double epsilon = 1e-10) {
int depth = tree->leaf_depth(leaf_index);
if (penalization >= depth + 1.) {
return epsilon;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you can use kEpsilon from meta.h

@CharlesAuguste
Copy link
Contributor Author

@guolinke I answered all your comments. I am not sure why travis is still pending, because all checks seem green and it has been pending for a while now. Let me know what you think of this PR. Thanks

@CharlesAuguste
Copy link
Contributor Author

Hi @guolinke, is there anything else you want to do in this PR? Thanks

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CharlesAuguste Please check some minor comments from me below.

include/LightGBM/config.h Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
@CharlesAuguste
Copy link
Contributor Author

CharlesAuguste commented Apr 7, 2020

@StrikerRUS I answered your comments above. However some checks are not passing. Can you advise on the errors? Thanks

  • ! LaTeX Error: File `inconsolata.sty' not found.
  • There was an error in the .travis.yml file from which we could not recover. Unfortunately, we do not know much about this error.

EDIT: Actually I see that this is related to one of your latest PRs. I will wait for this to be fixed. Thanks

@StrikerRUS
Copy link
Collaborator

@CharlesAuguste

EDIT: Actually I see that this is related to one of your latest PRs. I will wait for this to be fixed. Thanks

Yeah, our docs and latex things stopped working yesterday. I just merged workarounds in master, please pull them to make everything work again.

@CharlesAuguste CharlesAuguste force-pushed the PR3-monotone-constraints-splits-penalization branch from 053d93e to 3ab294c Compare April 7, 2020 15:13
@CharlesAuguste
Copy link
Contributor Author

@CharlesAuguste

EDIT: Actually I see that this is related to one of your latest PRs. I will wait for this to be fixed. Thanks

Yeah, our docs and latex things stopped working yesterday. I just merged workarounds in master, please pull them to make everything work again.

@StrikerRUS Yes seems to be working fine now! Let me know if there is anything else you would like me to change. Thanks

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from my point of view, except two minor comments below.
Thanks a lot for this PR!

include/LightGBM/config.h Outdated Show resolved Hide resolved
constrained_model = lgb.train(params_constrained_model, trainset_constrained_model, 10)

# Check that a very high penalization is the same as not using the features at all
np.testing.assert_array_equal(constrained_model.predict(x), unconstrained_model_predictions)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a special function for floats comparison. Or is it a case when we need exact equality like here?

# we need to check the consistency of model file here, so test for exact equal
np.testing.assert_array_equal(pred_from_matr, pred_from_model_file)

Suggested change
np.testing.assert_array_equal(constrained_model.predict(x), unconstrained_model_predictions)
np.testing.assert_allclose(constrained_model.predict(x), unconstrained_model_predictions)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually not sure here. I compare a model with 1 variable (the unconstrained one) to a model with 3 variables, with 2 of the 3 that should not be used at all because of penalization (this is the constrained model here). So in the end, the 2 resulting models should output exactly the same results I think, because a single variable is used in both to make splits. So I would say that assert_array_equal is what we want, but I am not entirely sure. Given my explanation which do you think is best?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with that one-feature model should output the same predictions. But I have a doubt that we won't have problems due to internal grad/hess float summation. However, I'm OK with the current code while CI is producing green status. At least, we are now aware about the place where to look if problems appear.

CharlesAuguste and others added 2 commits April 7, 2020 17:21
@StrikerRUS StrikerRUS requested a review from guolinke April 8, 2020 12:36
Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good to me

@StrikerRUS StrikerRUS merged commit 505a145 into microsoft:master Apr 9, 2020
@CharlesAuguste
Copy link
Contributor Author

Thanks for merging. Just one more PR to go now! The final one may a bit trickier, I will try to open it in a few days. Thanks

jameslamb pushed a commit to jameslamb/LightGBM that referenced this pull request Apr 10, 2020
* Add the monotone penalty parameter to the config.

* Pass tree in the necessary functions so it can be used in ComputeBestSplitForFeature.

* Add monotone penalty.

* Added link to the original report.

* Add tests.

* Fix GPU.

* Revert "Pass tree in the necessary functions so it can be used in ComputeBestSplitForFeature."

This reverts commit 37757e8.

* Revert "Fix GPU."

This reverts commit e49eeee.

* Added a shared pointer to the tree so the constraints can use it too.

* Moved check on monotone penalty to config.cpp.

* Python linting.

* Use AssertTrue instead of assert_.

* Fix penalization in test.

* Make GPU deterministic in tests.

* Rename tree to tree_ in monotone constraints.

* Replaced epsilon by kEplison.

* Typo.

* Make tree pointer const.

* Update src/treelearner/monotone_constraints.hpp

Co-Authored-By: Guolin Ke <guolin.ke@outlook.com>

* Update src/treelearner/monotone_constraints.hpp

Co-Authored-By: Guolin Ke <guolin.ke@outlook.com>

* Added alias for the penalty.

* Remove useless comment.

* Save CI time.

* Refactor test_monotone_penalty_max.

* Update include/LightGBM/config.h

Co-Authored-By: Nikita Titov <nekit94-08@mail.ru>

* Fix doc to be in line with previous config change commit.

Co-authored-by: Charles Auguste <auguste@dubquantdev801.ire.susq.com>
Co-authored-by: Guolin Ke <guolin.ke@outlook.com>
Co-authored-by: Nikita Titov <nekit94-08@mail.ru>
@StrikerRUS
Copy link
Collaborator

Hello @CharlesAuguste !

We want to include new constraints methods in 3.0.0 release (we don't have any particular release date though). So just want to know how is preparation for the last remaining PR going.

@CharlesAuguste
Copy link
Contributor Author

Hello @CharlesAuguste !

We want to include new constraints methods in 3.0.0 release (we don't have any particular release date though). So just want to know how is preparation for the last remaining PR going.

Hi @StrikerRUS, sorry for the delay. I saw the conversation on the release PR, and have been working on this. I should hopefully be able to open a PR in the coming days. I will keep you updated. Thanks

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 24, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants