Rasa spliting algorithm does not give precise number of training samples. #6582
Labels
area:rasa-oss 🎡
Anything related to the open source Rasa framework
type:bug 🐛
Inconsistencies or issues which will cause an issue or problem for users or implementors.
Description of Problem: Some examples may be missed due to Rasa spliting algorithm.The issues is clearly depicted at forum thread
Overview of the Solution:
rasa data split
does not give precise number of training samples.Say overall we have X samples (x1 samples of label l1, x2 samples of label l2, …) and
training-fraction
is 0.8.(Note: x1 + x2 + … = X).
In the code of Rasa , number of training samples is A = int(0.8 * x1) + int(0.8 * x2) + …
Mathematically, A ≤ int(0.8 * X).
So number of missing samples is int(0.8 * X) - A
The text was updated successfully, but these errors were encountered: