Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unstable reinforce with baseline model #192

Open
Jacobi93 opened this issue Feb 21, 2019 · 2 comments
Open

Unstable reinforce with baseline model #192

Jacobi93 opened this issue Feb 21, 2019 · 2 comments

Comments

@Jacobi93
Copy link

Hi, thank you for your wonderful codes. It helps me a lot.
In the REINFORCE with baseline for cliff_walking, I could not obtain stable results. The best reward should be -15 as you plotted. But sometimes when I run the code without any change, it converges to -100, which is very weird.
Could anyone run the code for several times and find out why is that?
Thank you so much.

@JaySiu
Copy link

JaySiu commented Feb 22, 2019

Same here, the algorithm couldn't converge as the example does. But off-policy Q-learning with linear function approximation does not guarantee convergence, according to David Silver's lecture notes 6 page 32. It is interesting that how the original example gets converged.
My result:
download

@Jacobi93
Copy link
Author

Do not guarantee means that it may converge, is not guaranteed. Different initializers and random policies may lead to different results. but maybe it is better for the author to mention it.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants