Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modified to speed up UpliftTreeClassifier.growDecisionTreeFrom. #695

Merged
merged 2 commits into from
Oct 25, 2023

Conversation

peterloleungyau
Copy link
Contributor

Mainly reduce allocation, in a synthetic data of 40000 rows, about 400 features, 4 treatments, there is a speed up of about 65x over the original version.

The main modifications are in the comments, reproduced below for convenience.

Proposed changes

Modified to speed up UpliftTreeClassifier.growDecisionTreeFrom and
consequently can also speed up UpliftTreeClassifier.fit, and
UpliftRandomForestClassifier.

Modification version 1, on 2023-10-04 by Peter Lo peterloleungyau@gmail.com

  • created divideSet_len from divideSet, which does not split the whole
    X into X_l and X_r, but returns len(X_l) and len(X_r) instead,
    because in finding the best split, the whole X_l and X_r are not
    used. This saves some allocations.

  • in finding the best split, we use mostly scalar values
    (e.g. best_col and best_value in favor of one single tuple
    bestAttribute) to keep the current best, and further avoids
    allocating many objects. After finding the best split, we use the
    original divideSet to get the X_l and X_r and subsequently
    best_set_left and best_set_right which are used in subsequent
    construction of subtrees of the left and right branches; and
    calculate bestAttribute as a tuple.

  • also, from the annotation of Cython, it seems some constant values
    such as the list of percentile values to test for threshold will be
    constructed in each iteration of the loop, because the Cython
    compiler cannot be sure that the list will not be modified by
    np.percentile, but since we can be sure, we create the list once
    outside the loop.

Modification verison 2, on 2023-10-05 by Peter Lo peterloleungyau@gmail.com

  • modified group_uniqueCounts to group_uniqueCounts_to_arr, to use
    pre-allocated flat numpy array to fill in the counts, instead of
    using list of list.

  • modified tree_node_summary to tree_node_summary_to_arr to use
    pre-allocated flat numpy arrays, and to fill in the node summary in
    new format, which consist of two parallel arrays: out_summary_p for
    [P(Y=1|T=i)...] and out_summary_n for [N(T=i)...], instead of the
    previous list of list.

  • for all the evaluation functions, created the arr_evaluate_*
    versions that uses the new node summary format. Also created
    arr_normI as the counter-part of normI using the new node summary
    format.

  • added some cdef to give type declarations to help speed up some
    calculations.

Modifcation version 3, on 2023-10-06 by Peter Lo peterloleungyau@gmail.com

  • combine divideSet_len and group_uniqueCounts_to_arr to be
    group_counts_by_divide, so as to reduce creating intermediate
    objects. This calculates the group counts for the left branch.

  • created tree_node_summary_from_counts from tree_node_summary_to_arr,
    to use pre-calculated group counts (e.g. by group_counts_by_divide).

  • for right branch, calculate the group counts by subtracting the
    group counts of left branch from the total count (pre-calculated
    outside the looping of choosing split points), which should be more
    efficient than looping through the data again.

Types of changes

What types of changes does your code introduce to CausalML?
Put an x in the boxes that apply

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation Update (if none of the other choices apply)

Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your code.

  • I have read the CONTRIBUTING doc
  • I have signed the CLA
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)
  • Any dependent changes have been merged and published in downstream modules

Further comments

If this is a relatively large or complex change, kick off the discussion by explaining why you chose the solution you did and what alternatives you considered, etc. This PR template is adopted from appium.

Mainly reduce allocation, in a synthetic data of 40000 rows, about 400
features, 4 treatments, there is a speed up of about 65x over the
original version.

The main modifications are in the comments.
These comments are in the pull request anyway, so remove so as not to
clutter the code.
Copy link
Collaborator

@jeongyoonlee jeongyoonlee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for your contribution, @peterloleungyau!

@jeongyoonlee jeongyoonlee merged commit 0f6de9c into uber:master Oct 25, 2023
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants