-
Add entropy term to encourage exploration
-
GAE
-
Distributional
-
Other environments
-
Bigger -> SLower nets
-
The exploration noise causes NAN gradients, thus NAN outputs
-
Need experience replay because it's OBVIOUSLY forgetting stuff from the past.
-
Use OpenAI examples
-
Combine 2 nets into one -> Works -> Learns a bit slower I think
-
Tuned hyper-parameters, specifically the size of roll-outs, number of updates and batch size
-
Next step -> Try GAE estimation
-
After -> Train in distributed setting with harder environments
-
Compare to OpenAI baseline
-
Incorporate into StarCraft