- Rainbow [code] [tests]
- DQN and
- DDQN or
- Dueling DQN or
- Noisy Networks or
- Prioritized Experience Replay or
- N Step DQN or
- Categorical DQN / Quantile Regression
- DDPG [code] [tests]
- TD3 [code] [tests]
- SAC [code] [tests]
- ACER [code] [tests]
- A2C [code] [tests]
- TRPO [code] [tests]
- ACKTR [code] [tests]
- PPO [code] [tests]
- GAIL [code] [tests]
- HER [code] [tests]
- MADDPG [code] [tests]
Note: Not tested thoroughly. Not to be used for anything serious
- pytorch == 1.5
- gym == 0.15.6
- multiagent particle environment (for MADDPG)
- numpy == 1.18
- ACKTR: KFAC implementation of @ikostrikov
- TRPO: For line search this is used as reference. @ikostrikov
- Noisy Linear Layers @higgsfield
- Parallel Environment. @dolhana
- For PER: segment tree @hill-a
- For policy_heads: distribution modules of @ikostrikov is used.