Correctly calibrate confidence values for non-ML policies #7969

dakshvar22 · 2021-02-16T18:54:00Z

#7616 introduced a bug with model_confidence set to cosine & inner in TEDPolicy. Since the confidences are not guranteed to stay in the range of [0,1] and other non ML policies like FormPolicy, MappingPolicy, etc. still predict a confidence value in the range [0,1], it messes up the logic of picking the best prediction from different policies. We see two solutions:

Changing the confidence values of non-ML policies to be either `+infinity` or `-infinity` for a match v/s no match. 

Changing the confidence values of non-ML policies based on `model_confidence` set in `TEDPolicy`. So `[0,1]` for `softmax`, `[-1,1]` for `cosine`, `[-infinity, infinity]` for `inner`.

Technically, option (1) seems better because it keeps things simple and logical too (confidences should not have a finite value for deterministic policies) but then users may see potentially confusing debug messages like

 DEBUG rasa.core.processor - Predicted next action 'action_listen' with confidence +infinity.   But +infinity could be replaced with something more logical?

The text was updated successfully, but these errors were encountered:

dakshvar22 · 2021-02-19T10:08:26Z

Exalate commented:

dakshvar22 commented:

From discussions we decided to do the following -

We create a distinction between policies where confidence calculation is part of the algorithm(like TEDPolicy) and others where it is chosen artificially by us(like RulePolicy). Only TEDPolicy and FallbackPolicy fall in the first bucket whereas all others fall in the second bucket. Let's call the first bucket of policies as ML-based policies and the second bucket as rule-based policies.

In addition, we move the logic of fallback action prediction outside of RulePolicy and place it inside FallbackPolicy.

With this refactor in place we plan to follow these steps to pick the best action prediction -

Pick the best prediction from all rule-based policies based on priorities assigned to them.
If none of the above policies predicted an action, then look at confidence of ML-based policies and pick the prediction with higher confidence. (This will just be a competition between FallbackPolicy and TEDPolicy for now).

This removes the need for ranking policy predictions from rule-based policies based on confidences.

The only exception to the steps above is when TEDPolicy makes a prediction using the text of the input rather than intent of the input. In that case prediction of TEDPolicy always wins.(This is already happening in the code currently as part of e2e).

dakshvar22 · 2021-03-18T13:31:49Z

Exalate commented:

dakshvar22 commented:

This is no longer a bug for releases post 2.4.0 since the option of inner as model confidence was removed. However, there is still merit to re-introducing it with proper support as described above.

alopez · 2021-04-28T17:06:40Z

Exalate commented:

alopez commented:

@dakshvar22 is there a definition of done for this? It seems like it would really benefit from having policy evaluation in place first.

dakshvar22 · 2021-04-28T17:10:14Z

Exalate commented:

dakshvar22 commented:

What kind of policy evaluation do you mean? Anything other than what rasa test core does?

dakshvar22 · 2021-04-28T17:12:23Z

Exalate commented:

dakshvar22 commented:

The issue should be considered done when this is the state of how action is picked in the policy ensemble.

alopez · 2021-04-28T17:33:32Z

Exalate commented:

alopez commented:

I mean an evaluation which gives more insight into how policies in the ensemble interact, one of the directions for measuring success.

dakshvar22 · 2021-04-28T17:38:19Z

Exalate commented:

dakshvar22 commented:

I don't have a hard opinion but I don't see these related. This issue's purpose was to avoid the usage of "confidences" for rule-based policies in our code so that the code reflects our understanding of how the ensemble works "currently". Once that is done, trying different confidence measures (like unbounded dot product similarities) will become possible.

dakshvar22 · 2021-04-28T17:56:34Z

Exalate commented:

dakshvar22 commented:

I just mean that it's not necessary for the two to be linked. The upcoming model regression tests for core is a good enough evaluation for it.

sync-by-unito · 2022-12-19T12:50:48Z

➤ Maxime Verger commented:

💡 Heads up! We're moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS.

From now on, this Jira board is the place where you can browse (without an account) and create issues (you'll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue!

➡️ More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569.

dakshvar22 added area:rasa-oss/ml 👁 All issues related to machine learning type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. labels Feb 16, 2021

dakshvar22 removed the type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. label Mar 18, 2021

dakshvar22 added the research:feature-performance-improvement Relates to problems found in the correct functioning / suggested improvement of a research feature label May 4, 2021

rasabot-exalate added area:rasa-oss/ml and removed area:rasa-oss/ml 👁 All issues related to machine learning labels Mar 15, 2022 — with Exalate Issue Sync

m-vdb added area:rasa-oss/ml 👁 All issues related to machine learning and removed area:rasa-oss/ml labels Mar 16, 2022

m-vdb added the type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR label Oct 10, 2022

m-vdb closed this as completed Jan 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly calibrate confidence values for non-ML policies #7969

Correctly calibrate confidence values for non-ML policies #7969

dakshvar22 commented Feb 16, 2021 •

edited by rasabot-exalate

Loading

dakshvar22 commented Feb 19, 2021 •

edited by rasabot-exalate

Loading

dakshvar22 commented Mar 18, 2021 •

edited by rasabot-exalate

Loading

alopez commented Apr 28, 2021 •

edited by rasabot-exalate

Loading

dakshvar22 commented Apr 28, 2021 •

edited by rasabot-exalate

Loading

dakshvar22 commented Apr 28, 2021 •

edited by rasabot-exalate

Loading

alopez commented Apr 28, 2021 •

edited by rasabot-exalate

Loading

dakshvar22 commented Apr 28, 2021 •

edited by rasabot-exalate

Loading

dakshvar22 commented Apr 28, 2021 •

edited by rasabot-exalate

Loading

sync-by-unito bot commented Dec 19, 2022

Correctly calibrate confidence values for non-ML policies #7969

Correctly calibrate confidence values for non-ML policies #7969

Comments

dakshvar22 commented Feb 16, 2021 • edited by rasabot-exalate Loading

Changing the confidence values of non-ML policies to be either +infinity or -infinity for a match v/s no match.

Changing the confidence values of non-ML policies based on model_confidence set in TEDPolicy. So <span class="error">[0,1]</span> for softmax, <span class="error">[-1,1]</span> for cosine, <span class="error">[-infinity, infinity]</span> for inner.

dakshvar22 commented Feb 19, 2021 • edited by rasabot-exalate Loading

dakshvar22 commented Mar 18, 2021 • edited by rasabot-exalate Loading

alopez commented Apr 28, 2021 • edited by rasabot-exalate Loading

dakshvar22 commented Apr 28, 2021 • edited by rasabot-exalate Loading

dakshvar22 commented Apr 28, 2021 • edited by rasabot-exalate Loading

alopez commented Apr 28, 2021 • edited by rasabot-exalate Loading

dakshvar22 commented Apr 28, 2021 • edited by rasabot-exalate Loading

dakshvar22 commented Apr 28, 2021 • edited by rasabot-exalate Loading

sync-by-unito bot commented Dec 19, 2022

dakshvar22 commented Feb 16, 2021 •

edited by rasabot-exalate

Loading

Changing the confidence values of non-ML policies to be either `+infinity` or `-infinity` for a match v/s no match. 

Changing the confidence values of non-ML policies based on `model_confidence` set in `TEDPolicy`. So `<span class="error">[0,1]</span>` for `softmax`, `<span class="error">[-1,1]</span>` for `cosine`, `<span class="error">[-infinity, infinity]</span>` for `inner`.  

dakshvar22 commented Feb 19, 2021 •

edited by rasabot-exalate

Loading

dakshvar22 commented Mar 18, 2021 •

edited by rasabot-exalate

Loading

alopez commented Apr 28, 2021 •

edited by rasabot-exalate

Loading

dakshvar22 commented Apr 28, 2021 •

edited by rasabot-exalate

Loading

dakshvar22 commented Apr 28, 2021 •

edited by rasabot-exalate

Loading

alopez commented Apr 28, 2021 •

edited by rasabot-exalate

Loading

dakshvar22 commented Apr 28, 2021 •

edited by rasabot-exalate

Loading

dakshvar22 commented Apr 28, 2021 •

edited by rasabot-exalate

Loading