-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to train and evaluate policy models with unified dataset format? #191
Comments
Another question is I found that my trained PPO policy will give tens of system act output in every turn, is it expected? |
@ChrisGeishauser could you give some guidance? |
Hi @JamesCao2048, thanks a lot for all your questions! I hope I can answer them sufficiently for you:
Another question is I found that my trained PPO policy will give tens of system act output in every turn, is it expected? This is an indicator that you used a random policy, in this case the output is expected: the architecture of the policy has an output dimension that is equal to the number of "atomic actions" (e.g. hotel-inform-phone or restaurant-request-price). For every "atomic actions" there is a binary decision wether to use it or not. In case of a random policy, there is roughly a chance of 50% to take an atomic actions, which will lead to a lot of actions. I hope I could help you with the answers! Let me know if something is unclear. |
@ChrisGeishauser sorry for bothering, do you have any estimates on when will the evaluator class be ready? another thing, the vectorizers seem to work only on the mutliwoz dataset |
Hi there, I noticed that there are APIs to load NLU, DST, Policy and NLG data in unified data format. Besides, I found the training and evaluation guide for NLU/DST/NLG with unified data in $model/README.md or NLU/DST/NLG/evaluate_unified_datasets.py. However, I did not find a guide for how to train and evaluate policy models with unified data format. Specifically, I have the following questions:
BERTNLU | RuleDST | PPOPolicy | TemplateNLG evaluation in ConvLab2 ReadME (75.5 completion rate and 71.7 success rate). How does this gap come from?
Looking forward to your reply,
James Cao
The text was updated successfully, but these errors were encountered: