Vanilla REINFORCE implementation #200

alek5k · 2019-05-08T04:45:28Z

Hello,

Is there any benefit to having a vanilla REINFORCE algorithm for people trying to learn the concepts? REINFORCE with Baseline includes a value function approximator which has a lot of similarities to the Actor Critic.

I think being able to see a pure policy gradient method could be useful as a learning tool, otherwise people may assume Policy Gradient methods have to have some kind of value function approximation too.

makaveli10 · 2020-04-30T06:13:56Z

Look at this if you want to see the high variance results of Vanilla reinforce

vieveks · 2023-02-07T07:28:25Z

Can I implement the vanilla REINFORCE ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vanilla REINFORCE implementation #200

Vanilla REINFORCE implementation #200

alek5k commented May 8, 2019 •

edited

Loading

makaveli10 commented Apr 30, 2020

vieveks commented Feb 7, 2023

Vanilla REINFORCE implementation #200

Vanilla REINFORCE implementation #200

Comments

alek5k commented May 8, 2019 • edited Loading

makaveli10 commented Apr 30, 2020

vieveks commented Feb 7, 2023

alek5k commented May 8, 2019 •

edited

Loading