Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved python tree plots #2304

Merged

Conversation

CharlesAuguste
Copy link
Contributor

@CharlesAuguste CharlesAuguste commented Aug 1, 2019

In this pull request, we tried to improve the way trees are plotted by the function plot_tree . Visually digestible trees are essential for interpreting, debugging, and improving tree-based methods. LightGBM's current implementation of plot_tree produces trees which are very hard to mentally process and which, in our opinion, lack some useful information, like which nodes are constrained and how much data flows down through each node.

This PR makes several changes to the plot_tree function which increases the amount of information displayed in a given tree whilst, at the same time, making the tree more digestible.

Moreover, we add monotonically constrained features to the model's meta information so that this information is accessible through the model.

The figures below show how the output from the current plot_tree function compares with the output from our proposed plot_tree function, and, demonstrate how the proposed output contains significantly more information than the current output, yet, is much easier to interpret.

Here is a list of the new features from this pull request:

  • Trees are now plotted horizontally and not vertically, and are more compact, which allows for a better visualization of big trees;
  • Features and thresholds are now written in a more compact yet intelligible way;
  • The default precision of the information printed is now 3;
  • The user can now print the data percentage in nodes and leaves;
  • Nodes splitted on a monotone feature are now colored in light green for a monotone increasing feature, and in light red for a monotone decreasing feature, and a legend is added to explain what these features correspond to;
  • Important information is printed in bold;
  • Other minor changes.

Current LightGBM tree:
image

Tree generated by the code of this pull request
image

Charles Auguste and others added 25 commits August 1, 2019 11:06
This reverts commit dd8bf14a3ba604b0dfae3b7bb1c64b6784d15e03.
@msftclas
Copy link

msftclas commented Aug 1, 2019

CLA assistant check
All CLA requirements met.

@aldanor
Copy link

aldanor commented Aug 1, 2019

+@redditur

@CharlesAuguste
Copy link
Contributor Author

Initially we only had features and leaf values in bold because we thought they were the most important information. But as you said, it is not consistent. In my opinion, a good compromise would be this (thresholds in bold added).
image

I don't think it makes sense to put "leaf x" in bold, because it is not an important information. But if you still prefer it that way I will make the necessary changes.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CharlesAuguste Totally agree with you! Plots look really amazing now! Thanks a lot for your contribution!

Only one remaining thing has some room for improvements, I think. In my opinion, legend can be somehow clarified. At least, some words are needed about that Increasing/Decreasing is related to monotone constraints. Imagine, someone uses tree plot in their slides. To be honest, without any additional info (or code around a picture), it's not very clear, what means Increasing/Decreasing...

@CharlesAuguste
Copy link
Contributor Author

Would you like this better ("Monotone constraints" added on top of the legend)?
image

@StrikerRUS
Copy link
Collaborator

@CharlesAuguste Nice, thanks! I think it looks clearer now.

@StrikerRUS StrikerRUS mentioned this pull request Aug 17, 2019
@StrikerRUS StrikerRUS requested review from guolinke and jameslamb and removed request for jameslamb and guolinke August 27, 2019 10:24
@StrikerRUS
Copy link
Collaborator

@henry0312 @chivee @guolinke @jameslamb Can you please help and give a second review to this PR?

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know enough C++ to approve the config.h changes but looked over the rest and left a few minor stylistic comments. Otherwise, this is good from my perspective.

if root['decision_type'] == '<=':
l_dec, r_dec = '<=', '>'
operator = "&#8804;"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code would be easier to understand if this was put in a named variable, e.g.

lte_symbol = "&#8804;"
operator = lte_symbol

elif info == 'internal_count':
label += r'\n{0}: {1}'.format(info, root[info])
graph.node(name, label=label)
l_dec, r_dec = 'yes', "no"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my opinion, simple assignments like this should be on their own lines. Could you change this to?

l_dec = 'yes'
r_dec = 'no'

precision : int or None, optional (default=None)
'split_gain', 'internal_value', 'internal_count', 'internal_weight',
'leaf_count', 'leaf_weight', 'data_percentage'.
precision : int or None, optional (default=3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand this documentation please? What are the units and what is the implication of setting None?

Copy link
Collaborator

@StrikerRUS StrikerRUS Sep 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jameslamb This is nice catch, but I think this is beyond of this PR. precision was introduced in #1424 and is used in several functions (e.g. in plot_importance() (#1777)), so I'll address your comment in a separate PR after merging this one.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense!

if 'monotone_constraints' in model:
monotone_constraints = model['monotone_constraints']
else:
monotone_constraints = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please simplify this?

monotone_constraints = model.get('monotone_constraints', None)

@StrikerRUS
Copy link
Collaborator

Close-reopen to re-trigger Appveyor.

@StrikerRUS StrikerRUS closed this Sep 5, 2019
@StrikerRUS StrikerRUS reopened this Sep 5, 2019
@StrikerRUS
Copy link
Collaborator

@jameslamb Can you please review the changes made?

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look, looks good to me! This is an awesome improvement.

@StrikerRUS StrikerRUS merged commit f52be9b into microsoft:master Sep 8, 2019
@aldanor
Copy link

aldanor commented Sep 8, 2019

Great stuff, thanks guys 👍 (now we just need to figure out what to do with the other big PR to move it forward...)

@CharlesAuguste CharlesAuguste deleted the improved-python-tree-plots/LightGBM branch September 8, 2019 20:02
@CharlesAuguste CharlesAuguste restored the improved-python-tree-plots/LightGBM branch December 19, 2019 15:53
@CharlesAuguste CharlesAuguste deleted the improved-python-tree-plots/LightGBM branch December 19, 2019 15:54
@lock lock bot locked as resolved and limited conversation to collaborators Mar 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants