Lunar Lander with Reinforcement Learning

Soft-Actor Critic (SAC)

Hardware: Google Colab T4

Model Type	Discrete	Average Reward	Training Time	Total Training Steps
PPO	No	266.01	1:35:29	501,747
PPO	Yes	223.38	2:07:30	501,721
SAC	No	278.36	1:21:13	299,998
DQN	Yes	155.64	1:59:15	999,999

Set ent_coef for PPO as it encourages exploration of other actions. Stable Baselines3 defaults the value to 0.0. More Information
Do not set your eval_freq too low, as it can sometimes cause instability during learning due to being interrupted by evaluation. (e.g. >=10,000)
Stable Baseline3's DQN parameters exploration_initial_eps and exploration_final_eps help determine how exploratory your model is at the beginning and end of training.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
Images		Images
.gitignore		.gitignore
README.md		README.md
[Lunar Lander] Deep Q-Network (DQN).ipynb		[Lunar Lander] Deep Q-Network (DQN).ipynb
[Lunar Lander] Proximal Policy Optimization (PPO).ipynb		[Lunar Lander] Proximal Policy Optimization (PPO).ipynb
[Lunar Lander] Soft Actor-Critic (SAC).ipynb		[Lunar Lander] Soft Actor-Critic (SAC).ipynb