RasaHQ · rasabot · Feb 9, 2021 · Dec 19, 2020 · Dec 21, 2020 · Jan 6, 2021
diff --git a/changelog/7616.improvement.md b/changelog/7616.improvement.md
@@ -0,0 +1,24 @@
+Added two new parameters `constrain_similarities` and `model_confidence` to machine learning (ML) components - [DIETClassifier](components.mdx#dietclassifier), [ResponseSelector](components.mdx#dietclassifier) and [TEDPolicy](policies.mdx#ted-policy).
+
+Setting `constrain_similarities=True` adds a sigmoid cross-entropy loss on all similarity values to restrict them to an approximate range in `DotProductLoss`. This should help the models to perform better on real world test sets.
+By default, the parameter is set to `False` to preserve the old behaviour, but users are encouraged to set it to `True` and re-train their assistants as it will be set to `True` by default from Rasa Open Source 3.0.0 onwards.
+
+Parameter `model_confidence` affects how model's confidence for each label is computed during inference. It can take three values:
+1. `softmax` - Similarities between input and label embeddings are post-processed with a softmax function, as a result of which confidence for all labels sum up to 1.
+2. `cosine` - Cosine similarity between input label embeddings. Confidence for each label will be in the range `[-1,1]`.
+3. `inner` - Dot product similarity between input and label embeddings. Confidence for each label will be in an unbounded range.
+
+Setting `model_confidence=cosine` should help users tune the fallback thresholds of their assistant better. The default value is `softmax` to preserve the old behaviour, but we recommend using `cosine` as that will be the new default value from Rasa Open Source 3.0.0 onwards. The value of this option does not affect how confidences are computed for entity predictions in `DIETClassifier` and `TEDPolicy`.
+
+With both the above recommendations, users should configure their ML component, e.g. `DIETClassifier`, as
+```yaml
+- name: DIETClassifier
+  model_confidence: cosine
+  constrain_similarities: True
+  ...
+```
+Once the assistant is re-trained with the above configuration, users should also tune fallback confidence thresholds.
+
+Configuration option `loss_type=softmax` is now deprecated and will be removed in Rasa Open Source 3.0.0 . Use `loss_type=cross_entropy` instead.
+
+The default [auto-configuration](model-configuration.mdx#suggested-config) is changed to use `constrain_similarities=True` and `model_confidence=cosine` in ML components so that new users start with the recommended configuration.
diff --git a/data/test_config/config_empty_en_after_dumping.yml b/data/test_config/config_empty_en_after_dumping.yml
@@ -13,9 +13,13 @@ pipeline:
 #     max_ngram: 4
 #   - name: DIETClassifier
 #     epochs: 100
+#     constrain_similarities: true
+#     model_confidence: cosine
 #   - name: EntitySynonymMapper
 #   - name: ResponseSelector
 #     epochs: 100
+#     constrain_similarities: true
+#     model_confidence: cosine
 #   - name: FallbackClassifier
 #     threshold: 0.3
 #     ambiguity_threshold: 0.1
@@ -27,4 +31,6 @@ policies:
 #   - name: TEDPolicy
 #     max_history: 5
 #     epochs: 100
+#     constrain_similarities: true
+#     model_confidence: cosine
 #   - name: RulePolicy
diff --git a/data/test_config/config_empty_en_after_dumping_core.yml b/data/test_config/config_empty_en_after_dumping_core.yml
@@ -8,4 +8,6 @@ policies:
 #   - name: TEDPolicy
 #     max_history: 5
 #     epochs: 100
+#     constrain_similarities: true
+#     model_confidence: cosine
 #   - name: RulePolicy
diff --git a/data/test_config/config_empty_en_after_dumping_nlu.yml b/data/test_config/config_empty_en_after_dumping_nlu.yml
@@ -13,9 +13,13 @@ pipeline:
 #     max_ngram: 4
 #   - name: DIETClassifier
 #     epochs: 100
+#     constrain_similarities: true
+#     model_confidence: cosine
 #   - name: EntitySynonymMapper
 #   - name: ResponseSelector
 #     epochs: 100
+#     constrain_similarities: true
+#     model_confidence: cosine
 #   - name: FallbackClassifier
 #     threshold: 0.3
 #     ambiguity_threshold: 0.1

diff --git a/data/test_config/config_empty_fr_after_dumping.yml b/data/test_config/config_empty_fr_after_dumping.yml
@@ -13,9 +13,13 @@ pipeline:
 #     max_ngram: 4
 #   - name: DIETClassifier
 #     epochs: 100
+#     constrain_similarities: true
+#     model_confidence: cosine
 #   - name: EntitySynonymMapper
 #   - name: ResponseSelector
 #     epochs: 100
+#     constrain_similarities: true
+#     model_confidence: cosine
 #   - name: FallbackClassifier
 #     threshold: 0.3
 #     ambiguity_threshold: 0.1
@@ -27,4 +31,6 @@ policies:
 #   - name: TEDPolicy
 #     max_history: 5
 #     epochs: 100
+#     constrain_similarities: true
+#     model_confidence: cosine
 #   - name: RulePolicy
diff --git a/data/test_config/config_with_comments_after_dumping.yml b/data/test_config/config_with_comments_after_dumping.yml
@@ -27,6 +27,8 @@ policies: # even here
 #   - name: TEDPolicy
 #     max_history: 5
 #     epochs: 100
+#     constrain_similarities: true
+#     model_confidence: cosine
 #   - name: RulePolicy
 
 # comments everywhere
diff --git a/docs/docs/components.mdx b/docs/docs/components.mdx
@@ -1531,10 +1531,12 @@ However, additional parameters exist that can be adapted.
 | similarity_type                 | "auto"           | Type of similarity measure to use, either 'auto' or 'cosine' |
 |                                 |                  | or 'inner'.                                                  |
 +---------------------------------+------------------+--------------------------------------------------------------+
-| loss_type                       | "softmax"        | The type of the loss function, either 'softmax' or 'margin'. |
+| loss_type                       | "cross_entropy"  | The type of the loss function, either 'cross_entropy'        |
+|                                 |                  | or 'margin'.                                                 |
 +---------------------------------+------------------+--------------------------------------------------------------+
-| ranking_length                  | 10               | Number of top actions to normalize scores for loss type      |
-|                                 |                  | 'softmax'. Set to 0 to turn off normalization.               |
+| ranking_length                  | 10               | Number of top intents to normalize scores for. Applicable    |
+|                                 |                  | only with loss type 'cross_entropy' and 'softmax'            |
+|                                 |                  | confidences. Set to 0 to disable normalization.              |
 +---------------------------------+------------------+--------------------------------------------------------------+
 | maximum_positive_similarity     | 0.8              | Indicates how similar the algorithm should try to make       |
 |                                 |                  | embedding vectors for correct labels.                        |
@@ -1616,6 +1618,24 @@ However, additional parameters exist that can be adapted.
 |                                 |                  | ...                                                          |
 |                                 |                  | ```                                                          |
 +---------------------------------+------------------+--------------------------------------------------------------+
+| constrain_similarities          | False            | If `True`, applies sigmoid on all similarity terms and adds  |
+|                                 |                  | it to the loss function to ensure that similarity values are |
+|                                 |                  | approximately bounded. Used only if `loss_type=cross_entropy`|
++---------------------------------+------------------+--------------------------------------------------------------+
+| model_confidence                | "softmax"        | Affects how model's confidence for each intent               |
+|                                 |                  | is computed. It can take three values                        |
+|                                 |                  | 1. `softmax` - Similarities between input and intent         |
+|                                 |                  | embeddings are post-processed with a softmax function,       |
+|                                 |                  | as a result of which confidence for all intents sum up to 1. |
+|                                 |                  | 2. `cosine` - Cosine similarity between input and intent     |
+|                                 |                  | embeddings. Confidence for each intent is in the             |
+|                                 |                  | range `[-1,1]`.                                              |
+|                                 |                  | 3. `inner` - Dot product similarity between input and intent |
+|                                 |                  | embeddings. Confidence for each intent is in an unbounded    |
+|                                 |                  | range.                                                       |
+|                                 |                  | This parameter does not affect the confidence for entity     |
+|                                 |                  | prediction.                                                  |
++---------------------------------+------------------+--------------------------------------------------------------+
 ```
 
 :::note
@@ -2742,10 +2762,12 @@ However, additional parameters exist that can be adapted.
 | similarity_type                 | "auto"            | Type of similarity measure to use, either 'auto' or 'cosine' |
 |                                 |                   | or 'inner'.                                                  |
 +---------------------------------+-------------------+--------------------------------------------------------------+
-| loss_type                       | "softmax"         | The type of the loss function, either 'softmax' or 'margin'. |
+| loss_type                       | "cross_entropy"   | The type of the loss function, either 'cross_entropy'        |
+|                                 |                   | or 'margin'.                                                 |
 +---------------------------------+-------------------+--------------------------------------------------------------+
-| ranking_length                  | 10                | Number of top actions to normalize scores for loss type      |
-|                                 |                   | 'softmax'. Set to 0 to turn off normalization.               |
+| ranking_length                  | 10                | Number of top responses to normalize scores for. Applicable  |
+|                                 |                   | only with loss type 'cross_entropy' and 'softmax'            |
+|                                 |                   | confidences. Set to 0 to disable normalization.              |
 +---------------------------------+-------------------+--------------------------------------------------------------+
 | maximum_positive_similarity     | 0.8               | Indicates how similar the algorithm should try to make       |
 |                                 |                   | embedding vectors for correct labels.                        |
@@ -2814,6 +2836,22 @@ However, additional parameters exist that can be adapted.
 |                                 |                   | Requires `evaluate_on_number_of_examples > 0` and            |
 |                                 |                   | `evaluate_every_number_of_epochs > 0`                        |
 +---------------------------------+-------------------+--------------------------------------------------------------+
+| constrain_similarities          | False             | If `True`, applies sigmoid on all similarity terms and adds  |
+|                                 |                   | it to the loss function to ensure that similarity values are |
+|                                 |                   | approximately bounded. Used only if `loss_type=cross_entropy`|
++---------------------------------+-------------------+--------------------------------------------------------------+
+| model_confidence                | "softmax"         | Affects how model's confidence for each response label       |
+|                                 |                   | is computed. It can take three values                        |
+|                                 |                   | 1. `softmax` - Similarities between input and response label |
+|                                 |                   | embeddings are post-processed with a softmax function,       |
+|                                 |                   | as a result of which confidence for all labels sum up to 1.  |
+|                                 |                   | 2. `cosine` - Cosine similarity between input and response   |
+|                                 |                   | label embeddings. Confidence for each label is in the        |
+|                                 |                   | range `[-1,1]`.                                              |
+|                                 |                   | 3. `inner` - Dot product similarity between input and        |
+|                                 |                   | response label embeddings. Confidence for each label is in an|
+|                                 |                   | unbounded range.                                             |
++---------------------------------+-------------------+--------------------------------------------------------------+
 ```
 
 :::note

diff --git a/docs/docs/migration-guide.mdx b/docs/docs/migration-guide.mdx
@@ -10,6 +10,33 @@ description: |
 This page contains information about changes between major versions and
 how you can migrate from one version to another.
 
+## Rasa 2.2 to Rasa 2.3
+
+### Machine Learning Components
+
+A few changes have been made to the loss function inside machine learning (ML)
+components `DIETClassifier`, `ResponseSelector` and `TEDPolicy`. These include:
+1. Configuration option `loss_type=softmax` is now deprecated and will be removed in Rasa Open Source 3.0.0. Use `loss_type=cross_entropy` instead.
+2. The default loss function (`loss_type=cross_entropy`) can add an optional sigmoid cross-entropy loss of all similarity values to constrain
+them to an approximate range. You can turn on this option by setting `constrain_similarities=True`. This should help the models to perform better on real world test sets.
+
+Also, a new option `model_confidence` has been added to each ML component. It affects how model's confidence for each label is computed during inference. It can take one of three values:
+1. `softmax` - Similarities between input and label embeddings are post-processed with a softmax function, as a result of which confidence for all labels sum up to 1.
+2. `cosine` - Cosine similarity between input and label embeddings. Confidence for each label will be in the range `[-1,1]`.
+3. `inner` - Dot product similarity between input and label embeddings. Confidence for each label will be in an unbounded range.
+The default value is `softmax`, but we recommend using `cosine` as that will be the new default value from Rasa Open Source 3.0.0 onwards.
+The value of this option does not affect how confidences are computed for entity predictions in `DIETClassifier` and `TEDPolicy`.
+
+With both the above recommendations, users should configure their ML component, e.g. `DIETClassifier`, as:
+```
+- name: DIETClassifier
+  model_confidence: cosine
+  constrain_similarities: True
+  ...
+```
+Once the assistant is re-trained with the above configuration, users should also tune fallback confidence thresholds.
+
+
 ## Rasa 2.1 to Rasa 2.2
 
 ### General

diff --git a/docs/docs/policies.mdx b/docs/docs/policies.mdx
@@ -268,10 +268,12 @@ However, additional parameters exist that can be adapted.
 | similarity_type                       | "auto"                 | Type of similarity measure to use, either 'auto' or 'cosine' |
 |                                       |                        | or 'inner'.                                                  |
 +---------------------------------------+------------------------+--------------------------------------------------------------+
-| loss_type                             | "softmax"              | The type of the loss function, either 'softmax' or 'margin'. |
+| loss_type                             | "cross_entropy"        | The type of the loss function, either 'cross_entropy'        |
+|                                       |                        | or 'margin'.                                                 |
 +---------------------------------------+------------------------+--------------------------------------------------------------+
-| ranking_length                        | 10                     | Number of top actions to normalize scores for loss type      |
-|                                       |                        | 'softmax'. Set to 0 to turn off normalization.               |
+| ranking_length                        | 10                     | Number of top actions to normalize scores for. Applicable    |
+|                                       |                        | only with loss type 'cross_entropy' and 'softmax'            |
+|                                       |                        | confidences. Set to 0 to disable normalization.              |
 +---------------------------------------+------------------------+--------------------------------------------------------------+
 | maximum_positive_similarity           | 0.8                    | Indicates how similar the algorithm should try to make       |
 |                                       |                        | embedding vectors for correct labels.                        |
@@ -344,6 +346,22 @@ However, additional parameters exist that can be adapted.
 | entity_recognition                    | True                   | If 'True' entity recognition is trained and entities are     |
 |                                       |                        | extracted.                                                   |
 +---------------------------------------+------------------------+--------------------------------------------------------------+
+| constrain_similarities                | False                  | If `True`, applies sigmoid on all similarity terms and adds  |
+|                                       |                        | it to the loss function to ensure that similarity values are |
+|                                       |                        | approximately bounded. Used only when `loss_type=softmax`.   |
++---------------------------------------+------------------------+--------------------------------------------------------------+
+| model_confidence                      | "softmax"              | Affects how model's confidence for each action               |
+|                                       |                        | is computed. It can take three values                        |
+|                                       |                        | 1. `softmax` - Similarities between input and action         |
+|                                       |                        | embeddings are post-processed with a softmax function,       |
+|                                       |                        | as a result of which confidence for all labels sum up to 1.  |
+|                                       |                        | 2. `cosine` - Cosine similarity between input and action     |
+|                                       |                        | embeddings. Confidence for each label is in the              |
+|                                       |                        | range `[-1,1]`.                                              |
+|                                       |                        | 3. `inner` - Dot product similarity between input and action |
+|                                       |                        | embeddings. Confidence for each label is in an               |
+|                                       |                        | unbounded range.                                             |
++---------------------------------------+------------------------+--------------------------------------------------------------+
 | BILOU_flag                            | True                   | If 'True', additional BILOU tags are added to entity labels. |
 +---------------------------------------+------------------------+--------------------------------------------------------------+
 | split_entities_by_comma               | True                   | Splits a list of extracted entities by comma to treat each   |