Lunar Lander version 2

This project implements the Lunar Lander v2 problem using Deep Reinforcement Learning (DRL) techniques. The primary focus is on evaluating and improving the performance of the Deep Q-Network (DQN) by utilizing Dueling DQN (D3QN) architectures. This work was completed as part of a reinforcement learning course assignment.

LunarLanderV2.mp4

Algorithm Summaries

Deep Q-Network (DQN)

DQN is a popular reinforcement learning algorithm that combines Q-learning with deep neural networks. In the Lunar Lander environment, DQN uses a deep network to approximate the Q-value function, which tells the agent how good or bad it is to take certain actions in specific states.

However, DQN suffers from:

Overestimation Bias: It tends to overestimate the action values.
Instability: Training can become unstable due to correlated updates.

Dueling Double DQN (D3QN)

D3QN further enhances performance by incorporating the dueling architecture, where the Q-value function is split into two streams:

State Value Function (V(s)): How good it is to be in a state, regardless of action.
Advantage Function (A(s, a)): The benefit of taking a specific action compared to others.

This helps the agent learn more efficiently in the Lunar Lander by better differentiating between valuable states and actions, improving both stability and performance.

D3QN Summary for Lunar Lander

By combining Double DQN and Dueling Networks, D3QN offers significant improvements in solving the Lunar Lander problem. It results in:

Reduced overestimation of Q-values.
Improved state value approximation, making the agent more adept at landing in difficult scenarios.
Faster convergence and better reward maximization than standard DQN.

Implementation Details

Algorithms

DQN: The baseline algorithm, implemented with a simple feedforward network and epsilon-greedy exploration.
D3QN: An advanced version incorporating Double Q-learning and Dueling Network architectures, which leads to better stability and faster convergence.

Key Hyperparameters

Network architecture:
- 3 fully connected layers with 64 neurons each
- Activation: ReLU
Loss Function: Mean Squared Error (MSE)
Exploration Strategy: Epsilon-greedy with early stopping
Optimizer: Adam
Discount Factor: Varying γ over time as described in the report

(For a more comprehensive list of hyperparameters, refer to the report.)

Results

The D3QN model significantly outperformed the DQN in terms of stability and reward maximization. By introducing a dynamic gamma strategy and optimizing the training process, we achieved consistent improvements in solving the Lunar Lander environment.

GIFs

DQN:

Episode reward: 180.14

DQN.mp4

D3QN:

Episode reward: 249.5

D3QN.mp4

Conclusion

This project demonstrated the effectiveness of advanced DRL techniques in solving the Lunar Lander problem. The D3QN model, in particular, offered substantial improvements over the baseline DQN, and future work may explore further enhancements such as prioritized experience replay or multi-agent setups.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
lunarLander		lunarLander
LICENSE		LICENSE
LunaLanderDQNBoltzman.py		LunaLanderDQNBoltzman.py
LunarLanderD3QN.py		LunarLanderD3QN.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lunar Lander version 2

Algorithm Summaries

Deep Q-Network (DQN)

Dueling Double DQN (D3QN)

D3QN Summary for Lunar Lander

Implementation Details

Algorithms

Key Hyperparameters

Results

GIFs

DQN:

D3QN:

Conclusion

About

Releases

Packages

Languages

License

navidadkhah/LunaLander-v2

Folders and files

Latest commit

History

Repository files navigation

Lunar Lander version 2

Algorithm Summaries

Deep Q-Network (DQN)

Dueling Double DQN (D3QN)

D3QN Summary for Lunar Lander

Implementation Details

Algorithms

Key Hyperparameters

Results

GIFs

DQN:

D3QN:

Conclusion

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages