Skip to content

Commit

Permalink
Internal change
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 461446728
  • Loading branch information
achoum authored and copybara-github committed Jul 17, 2022
1 parent e171898 commit f55a0f3
Show file tree
Hide file tree
Showing 3 changed files with 57 additions and 4 deletions.
8 changes: 7 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,17 @@
# Changelog

## ???? - ????
## 0.2.5 - 2022-06-15

### Features

- Multithreading of the oblique splitter for gradient boosted tree models.
- Support for Javascript + WebAssembly inference of model.
- Support for pure serving model i.e. model containing only serving data.
- Add "edit_model" cli tool.

### Fix

- Remove bias toward low outcome in uplift modeling.

## 0.2.4 - 2022-05-17

Expand Down
4 changes: 4 additions & 0 deletions documentation/cli.txt
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,10 @@ edit_model: Edits a trained model.
--new_label_name (New label name.); default: "__NO__SET__";
--new_weights_name (New weights name.); default: "__NO__SET__";
--output (Output model directory.); default: "__NO__SET__";
--pure_serving (Clear the model from any information that is not required
for model serving.This includes debugging, model interpretation and other
meta-data. Can reduce significantly the size of the model.);
default: "__NO__SET__";

Try --helpfull to get a list of all flags or --help=substring shows help for
flags which include specified substring in either in the name, or description or
Expand Down
49 changes: 46 additions & 3 deletions documentation/learners.md
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,16 @@ the gradient of the loss relative to the model output).
- Maximum number of decision trees. The effective number of trained tree can
be smaller if early stopping is enabled.

#### [pure_serving_model](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:pure_serving_model)

- **Type:** Categorical **Default:** false **Possible values:** true, false

- Clear the model from any information that is not required for model serving.
This includes debugging, model interpretation and other meta-data. The size
of the serialized model can be reduced significatively (50% model size
reduction is common). This parameter has no impact on the quality, serving
speed or RAM usage of model serving.

#### [random_seed](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:random_seed)

- **Type:** Integer **Default:** 123456
Expand Down Expand Up @@ -396,7 +406,8 @@ the gradient of the loss relative to the model output).
#### [uplift_split_score](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_split_score)

- **Type:** Categorical **Default:** KULLBACK_LEIBLER **Possible values:**
KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS
KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS,
CONSERVATIVE_EUCLIDEAN_DISTANCE, CED

- For uplift models only. Splitter score i.e. score optimized by the splitters. The scores are introduced in "Decision trees for uplift modeling with single and multiple treatments", Rzepakowski et al. Notation: `p` probability / average value of the positive outcome, `q` probability / average value in the control group.<br>- `KULLBACK_LEIBLER` or `KL`: - p log (p/q)<br>- `EUCLIDEAN_DISTANCE` or `ED`: (p-q)^2<br>- `CHI_SQUARED` or `CS`: (p-q)^2/q<br>

Expand Down Expand Up @@ -672,6 +683,16 @@ It is probably the most well-known of the Decision Forest training algorithms.
increase the quality of the model at the expense of size, training speed,
and inference latency.

#### [pure_serving_model](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:pure_serving_model)

- **Type:** Categorical **Default:** false **Possible values:** true, false

- Clear the model from any information that is not required for model serving.
This includes debugging, model interpretation and other meta-data. The size
of the serialized model can be reduced significatively (50% model size
reduction is common). This parameter has no impact on the quality, serving
speed or RAM usage of model serving.

#### [random_seed](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:random_seed)

- **Type:** Integer **Default:** 123456
Expand Down Expand Up @@ -742,7 +763,8 @@ It is probably the most well-known of the Decision Forest training algorithms.
#### [uplift_split_score](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_split_score)

- **Type:** Categorical **Default:** KULLBACK_LEIBLER **Possible values:**
KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS
KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS,
CONSERVATIVE_EUCLIDEAN_DISTANCE, CED

- For uplift models only. Splitter score i.e. score optimized by the splitters. The scores are introduced in "Decision trees for uplift modeling with single and multiple treatments", Rzepakowski et al. Notation: `p` probability / average value of the positive outcome, `q` probability / average value in the control group.<br>- `KULLBACK_LEIBLER` or `KL`: - p log (p/q)<br>- `EUCLIDEAN_DISTANCE` or `ED`: (p-q)^2<br>- `CHI_SQUARED` or `CS`: (p-q)^2/q<br>

Expand Down Expand Up @@ -931,6 +953,16 @@ used to grow the tree while the second is used to prune the tree.
as well as -1. If not set or equal to -1, the `num_candidate_attributes` is
used.

#### [pure_serving_model](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:pure_serving_model)

- **Type:** Categorical **Default:** false **Possible values:** true, false

- Clear the model from any information that is not required for model serving.
This includes debugging, model interpretation and other meta-data. The size
of the serialized model can be reduced significatively (50% model size
reduction is common). This parameter has no impact on the quality, serving
speed or RAM usage of model serving.

#### [random_seed](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:random_seed)

- **Type:** Integer **Default:** 123456
Expand Down Expand Up @@ -991,7 +1023,8 @@ used to grow the tree while the second is used to prune the tree.
#### [uplift_split_score](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_split_score)

- **Type:** Categorical **Default:** KULLBACK_LEIBLER **Possible values:**
KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS
KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS,
CONSERVATIVE_EUCLIDEAN_DISTANCE, CED

- For uplift models only. Splitter score i.e. score optimized by the splitters. The scores are introduced in "Decision trees for uplift modeling with single and multiple treatments", Rzepakowski et al. Notation: `p` probability / average value of the positive outcome, `q` probability / average value in the control group.<br>- `KULLBACK_LEIBLER` or `KL`: - p log (p/q)<br>- `EUCLIDEAN_DISTANCE` or `ED`: (p-q)^2<br>- `CHI_SQUARED` or `CS`: (p-q)^2/q<br>

Expand Down Expand Up @@ -1107,6 +1140,16 @@ algorithm for an introduction to GBTs.
- Maximum number of decision trees. The effective number of trained tree can
be smaller if early stopping is enabled.

#### [pure_serving_model](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:pure_serving_model)

- **Type:** Categorical **Default:** false **Possible values:** true, false

- Clear the model from any information that is not required for model serving.
This includes debugging, model interpretation and other meta-data. The size
of the serialized model can be reduced significatively (50% model size
reduction is common). This parameter has no impact on the quality, serving
speed or RAM usage of model serving.

#### [random_seed](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:random_seed)

- **Type:** Integer **Default:** 123456
Expand Down

0 comments on commit f55a0f3

Please sign in to comment.