Skip to content

Repository containing code and notebooks exploring how to solve Gymnasium's Lunar Lander through Reinforcement Learning

Notifications You must be signed in to change notification settings

kuds/rl-lunar-lander

Repository files navigation

Lunar Lander with Reinforcement Learning

Soft-Actor Critic (SAC)

Deep Q Learning (DQN)

Proximal Policy Optimization (PPO)

Results

Hardware: Google Colab T4

Model Type Discrete Average Reward Training Time Total Training Steps
PPO No 266.01 1:35:29 501,747
PPO Yes 223.38 2:07:30 501,721
SAC No 278.36 1:21:13 299,998
DQN Yes 155.64 1:59:15 999,999

Training Notes

  • Set ent_coef for PPO as it encourages exploration of other actions. Stable Baselines3 defaults the value to 0.0. More Information
  • Do not set your eval_freq too low, as it can sometimes cause instability during learning due to being interrupted by evaluation. (e.g. >=10,000)
  • Stable Baseline3's DQN parameters exploration_initial_eps and exploration_final_eps help determine how exploratory your model is at the beginning and end of training.

Finding Theta Blog Posts:

About

Repository containing code and notebooks exploring how to solve Gymnasium's Lunar Lander through Reinforcement Learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published