diff --git a/CHANGELOG.md b/CHANGELOG.md index a6c74d8c..781a4951 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,11 +1,17 @@ # Changelog -## ???? - ???? +## 0.2.5 - 2022-06-15 ### Features - Multithreading of the oblique splitter for gradient boosted tree models. - Support for Javascript + WebAssembly inference of model. +- Support for pure serving model i.e. model containing only serving data. +- Add "edit_model" cli tool. + +### Fix + +- Remove bias toward low outcome in uplift modeling. ## 0.2.4 - 2022-05-17 diff --git a/documentation/cli.txt b/documentation/cli.txt index 0c2c8d2f..f2fa6627 100644 --- a/documentation/cli.txt +++ b/documentation/cli.txt @@ -164,6 +164,10 @@ edit_model: Edits a trained model. --new_label_name (New label name.); default: "__NO__SET__"; --new_weights_name (New weights name.); default: "__NO__SET__"; --output (Output model directory.); default: "__NO__SET__"; + --pure_serving (Clear the model from any information that is not required + for model serving.This includes debugging, model interpretation and other + meta-data. Can reduce significantly the size of the model.); + default: "__NO__SET__"; Try --helpfull to get a list of all flags or --help=substring shows help for flags which include specified substring in either in the name, or description or diff --git a/documentation/learners.md b/documentation/learners.md index 8d96b36e..b33a5054 100644 --- a/documentation/learners.md +++ b/documentation/learners.md @@ -304,6 +304,16 @@ the gradient of the loss relative to the model output). - Maximum number of decision trees. The effective number of trained tree can be smaller if early stopping is enabled. +#### [pure_serving_model](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:pure_serving_model) + +- **Type:** Categorical **Default:** false **Possible values:** true, false + +- Clear the model from any information that is not required for model serving. + This includes debugging, model interpretation and other meta-data. The size + of the serialized model can be reduced significatively (50% model size + reduction is common). This parameter has no impact on the quality, serving + speed or RAM usage of model serving. + #### [random_seed](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:random_seed) - **Type:** Integer **Default:** 123456 @@ -396,7 +406,8 @@ the gradient of the loss relative to the model output). #### [uplift_split_score](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_split_score) - **Type:** Categorical **Default:** KULLBACK_LEIBLER **Possible values:** - KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS + KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS, + CONSERVATIVE_EUCLIDEAN_DISTANCE, CED - For uplift models only. Splitter score i.e. score optimized by the splitters. The scores are introduced in "Decision trees for uplift modeling with single and multiple treatments", Rzepakowski et al. Notation: `p` probability / average value of the positive outcome, `q` probability / average value in the control group.
- `KULLBACK_LEIBLER` or `KL`: - p log (p/q)
- `EUCLIDEAN_DISTANCE` or `ED`: (p-q)^2
- `CHI_SQUARED` or `CS`: (p-q)^2/q
@@ -672,6 +683,16 @@ It is probably the most well-known of the Decision Forest training algorithms. increase the quality of the model at the expense of size, training speed, and inference latency. +#### [pure_serving_model](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:pure_serving_model) + +- **Type:** Categorical **Default:** false **Possible values:** true, false + +- Clear the model from any information that is not required for model serving. + This includes debugging, model interpretation and other meta-data. The size + of the serialized model can be reduced significatively (50% model size + reduction is common). This parameter has no impact on the quality, serving + speed or RAM usage of model serving. + #### [random_seed](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:random_seed) - **Type:** Integer **Default:** 123456 @@ -742,7 +763,8 @@ It is probably the most well-known of the Decision Forest training algorithms. #### [uplift_split_score](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_split_score) - **Type:** Categorical **Default:** KULLBACK_LEIBLER **Possible values:** - KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS + KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS, + CONSERVATIVE_EUCLIDEAN_DISTANCE, CED - For uplift models only. Splitter score i.e. score optimized by the splitters. The scores are introduced in "Decision trees for uplift modeling with single and multiple treatments", Rzepakowski et al. Notation: `p` probability / average value of the positive outcome, `q` probability / average value in the control group.
- `KULLBACK_LEIBLER` or `KL`: - p log (p/q)
- `EUCLIDEAN_DISTANCE` or `ED`: (p-q)^2
- `CHI_SQUARED` or `CS`: (p-q)^2/q
@@ -931,6 +953,16 @@ used to grow the tree while the second is used to prune the tree. as well as -1. If not set or equal to -1, the `num_candidate_attributes` is used. +#### [pure_serving_model](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:pure_serving_model) + +- **Type:** Categorical **Default:** false **Possible values:** true, false + +- Clear the model from any information that is not required for model serving. + This includes debugging, model interpretation and other meta-data. The size + of the serialized model can be reduced significatively (50% model size + reduction is common). This parameter has no impact on the quality, serving + speed or RAM usage of model serving. + #### [random_seed](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:random_seed) - **Type:** Integer **Default:** 123456 @@ -991,7 +1023,8 @@ used to grow the tree while the second is used to prune the tree. #### [uplift_split_score](../yggdrasil_decision_forests/learner/decision_tree/decision_tree.proto?q=symbol:uplift_split_score) - **Type:** Categorical **Default:** KULLBACK_LEIBLER **Possible values:** - KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS + KULLBACK_LEIBLER, KL, EUCLIDEAN_DISTANCE, ED, CHI_SQUARED, CS, + CONSERVATIVE_EUCLIDEAN_DISTANCE, CED - For uplift models only. Splitter score i.e. score optimized by the splitters. The scores are introduced in "Decision trees for uplift modeling with single and multiple treatments", Rzepakowski et al. Notation: `p` probability / average value of the positive outcome, `q` probability / average value in the control group.
- `KULLBACK_LEIBLER` or `KL`: - p log (p/q)
- `EUCLIDEAN_DISTANCE` or `ED`: (p-q)^2
- `CHI_SQUARED` or `CS`: (p-q)^2/q
@@ -1107,6 +1140,16 @@ algorithm for an introduction to GBTs. - Maximum number of decision trees. The effective number of trained tree can be smaller if early stopping is enabled. +#### [pure_serving_model](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:pure_serving_model) + +- **Type:** Categorical **Default:** false **Possible values:** true, false + +- Clear the model from any information that is not required for model serving. + This includes debugging, model interpretation and other meta-data. The size + of the serialized model can be reduced significatively (50% model size + reduction is common). This parameter has no impact on the quality, serving + speed or RAM usage of model serving. + #### [random_seed](../yggdrasil_decision_forests/learner/abstract_learner.proto?q=symbol:random_seed) - **Type:** Integer **Default:** 123456