Skip to content

dai-dao/PPO-Gluon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

  1. Add entropy term to encourage exploration

  2. GAE

  3. Distributional

  4. Other environments

  5. Bigger -> SLower nets

  6. The exploration noise causes NAN gradients, thus NAN outputs

  7. Need experience replay because it's OBVIOUSLY forgetting stuff from the past.

  8. Use OpenAI examples

  9. Combine 2 nets into one -> Works -> Learns a bit slower I think

  10. Tuned hyper-parameters, specifically the size of roll-outs, number of updates and batch size

  11. Next step -> Try GAE estimation

  12. After -> Train in distributed setting with harder environments

  13. Compare to OpenAI baseline

  14. Incorporate into StarCraft

About

Implementation of PPO in Gluon / MXnet

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published