Clean baseline implementation of PPO using an episodic TransformerXL memory
-
Updated
Jun 18, 2024 - Python
Clean baseline implementation of PPO using an episodic TransformerXL memory
Deep Reinforcement Learning by using Truly Proximal Policy Optimization in Tensorflow 2 and Pytorch
Deep Reinforcement Learning by using an on-policy adaptation of Maximum a Posteriori Policy Optimization (MPO)
Monte Carlo Search Tree for training shared Actor-Critic-Network on the game Hex🏋️
PyTorch implementation of V-MPO
Reinforcement learning, Policy Gradient, Actor-Critic, AC, Agent-based Simulation, Simple-world
My content of CS294 Deep Reinforcement Learning course, conduced by Sergey Levine from UC Berkeley.
On-policy MCTS combined with deep learning to train an actor-critic neural network that plays Hex (Con-tac-tix).
Add a description, image, and links to the on-policy topic page so that developers can more easily learn about it.
To associate your repository with the on-policy topic, visit your repo's landing page and select "manage topics."