Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly calibrate confidence values for non-ML policies #7969

Closed
dakshvar22 opened this issue Feb 16, 2021 · 9 comments
Closed

Correctly calibrate confidence values for non-ML policies #7969

dakshvar22 opened this issue Feb 16, 2021 · 9 comments
Labels
area:rasa-oss/ml 👁 All issues related to machine learning research:feature-performance-improvement Relates to problems found in the correct functioning / suggested improvement of a research feature type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR

Comments

@dakshvar22
Copy link
Contributor

dakshvar22 commented Feb 16, 2021

#7616 introduced a bug with model_confidence set to cosine & inner in TEDPolicy. Since the confidences are not guranteed to stay in the range of <span class="error">[0,1]</span> and other non ML policies like FormPolicy, MappingPolicy, etc. still predict a confidence value in the range <span class="error">[0,1]</span>, it messes up the logic of picking the best prediction from different policies. We see two solutions:


Changing the confidence values of non-ML policies to be either +infinity or -infinity for a match v/s no match.


Changing the confidence values of non-ML policies based on model_confidence set in TEDPolicy. So <span class="error">[0,1]</span> for softmax, <span class="error">[-1,1]</span> for cosine, <span class="error">[-infinity, infinity]</span> for inner.



Technically, option (1) seems better because it keeps things simple and logical too (confidences should not have a finite value for deterministic policies) but then users may see potentially confusing debug messages like


DEBUG rasa.core.processor - Predicted next action 'action_listen' with confidence +infinity.

 But +infinity could be replaced with something more logical?

@dakshvar22 dakshvar22 added area:rasa-oss/ml 👁 All issues related to machine learning type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. labels Feb 16, 2021
@dakshvar22
Copy link
Contributor Author

dakshvar22 commented Feb 19, 2021

Exalate commented:

dakshvar22 commented:

From discussions we decided to do the following -

We create a distinction between policies where confidence calculation is part of the algorithm(like TEDPolicy) and others where it is chosen artificially by us(like RulePolicy). Only TEDPolicy and FallbackPolicy fall in the first bucket whereas all others fall in the second bucket. Let's call the first bucket of policies as ML-based policies and the second bucket as rule-based policies.

In addition, we move the logic of fallback action prediction outside of RulePolicy and place it inside FallbackPolicy.

With this refactor in place we plan to follow these steps to pick the best action prediction -

  1. Pick the best prediction from all rule-based policies based on priorities assigned to them.
  2. If none of the above policies predicted an action, then look at confidence of ML-based policies and pick the prediction with higher confidence. (This will just be a competition between FallbackPolicy and TEDPolicy for now).

This removes the need for ranking policy predictions from rule-based policies based on confidences.

The only exception to the steps above is when TEDPolicy makes a prediction using the text of the input rather than intent of the input. In that case prediction of TEDPolicy always wins.(This is already happening in the code currently as part of e2e).

@dakshvar22
Copy link
Contributor Author

dakshvar22 commented Mar 18, 2021

Exalate commented:

dakshvar22 commented:

This is no longer a bug for releases post 2.4.0 since the option of inner as model confidence was removed. However, there is still merit to re-introducing it with proper support as described above.

@dakshvar22 dakshvar22 removed the type:bug 🐛 Inconsistencies or issues which will cause an issue or problem for users or implementors. label Mar 18, 2021
@alopez
Copy link
Contributor

alopez commented Apr 28, 2021

Exalate commented:

alopez commented:

@dakshvar22 is there a definition of done for this? It seems like it would really benefit from having policy evaluation in place first.

@dakshvar22
Copy link
Contributor Author

dakshvar22 commented Apr 28, 2021

Exalate commented:

dakshvar22 commented:

What kind of policy evaluation do you mean? Anything other than what rasa test core does?

@dakshvar22
Copy link
Contributor Author

dakshvar22 commented Apr 28, 2021

Exalate commented:

dakshvar22 commented:

The issue should be considered done when this is the state of how action is picked in the policy ensemble.

@alopez
Copy link
Contributor

alopez commented Apr 28, 2021

Exalate commented:

alopez commented:

I mean an evaluation which gives more insight into how policies in the ensemble interact, one of the directions for measuring success.

@dakshvar22
Copy link
Contributor Author

dakshvar22 commented Apr 28, 2021

Exalate commented:

dakshvar22 commented:

I don't have a hard opinion but I don't see these related. This issue's purpose was to avoid the usage of "confidences" for rule-based policies in our code so that the code reflects our understanding of how the ensemble works "currently". Once that is done, trying different confidence measures (like unbounded dot product similarities) will become possible.

@dakshvar22
Copy link
Contributor Author

dakshvar22 commented Apr 28, 2021

Exalate commented:

dakshvar22 commented:

I just mean that it's not necessary for the two to be linked. The upcoming model regression tests for core is a good enough evaluation for it.

@dakshvar22 dakshvar22 added the research:feature-performance-improvement Relates to problems found in the correct functioning / suggested improvement of a research feature label May 4, 2021
@rasabot-exalate rasabot-exalate added area:rasa-oss/ml and removed area:rasa-oss/ml 👁 All issues related to machine learning labels Mar 15, 2022 — with Exalate Issue Sync
@m-vdb m-vdb added area:rasa-oss/ml 👁 All issues related to machine learning and removed area:rasa-oss/ml labels Mar 16, 2022
@m-vdb m-vdb added the type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR label Oct 10, 2022
@sync-by-unito
Copy link

sync-by-unito bot commented Dec 19, 2022

➤ Maxime Verger commented:

💡 Heads up! We're moving issues to Jira: https://rasa-open-source.atlassian.net/browse/OSS.

From now on, this Jira board is the place where you can browse (without an account) and create issues (you'll need a free Jira account for that). This GitHub issue has already been migrated to Jira and will be closed on January 9th, 2023. Do not forget to subscribe to the corresponding Jira issue!

➡️ More information in the forum: https://forum.rasa.com/t/migration-of-rasa-oss-issues-to-jira/56569.

@m-vdb m-vdb closed this as completed Jan 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:rasa-oss/ml 👁 All issues related to machine learning research:feature-performance-improvement Relates to problems found in the correct functioning / suggested improvement of a research feature type:enhancement ✨ Additions of new features or changes to existing ones, should be doable in a single PR
Projects
None yet
Development

No branches or pull requests

4 participants