DeepLearning

03. Q-Learning with Epsilon decay, the average score is 7.76 with 100 episodes

In the Pendulum-v1 environment, the best possible reward is 0. Here are some thoughts to enhance the performance of DDPG on Pendulum-v1:

More Training: DDPG typically requires a lot of training to converge stably. Try increasing the training episodes, perhaps to 1000 or more.
Network Architecture & Hyperparameters: Experiment with different network architectures or adjust hyperparameters, like learning rates, optimizers, discount factors, soft update parameters, etc. Hyperparameter tuning is common and often necessary in deep reinforcement learning.
Exploration: DDPG uses a deterministic policy, but to boost exploration, we added some noise. Try different types and magnitudes of noise to enhance performance.
Replay Buffer: Ensure to have a sufficiently large replay buffer and use uniform sampling. Consider using priority replay, where more critical transitions have a higher chance of being sampled.
Target Network Update Frequency: Try updating the target networks more frequently or less often.
Learning Rate Scheduling: Using different learning rates at different stages of training could be beneficial. For instance, start with a higher learning rate and gradually decrease it over time.

When tweaking the model, change only one parameter and then evaluate performance, so we can better understand the impact of each adjustment.

now , I will try TD3 and SAC.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Deep Learning		Deep Learning
math		math
02_pytorch_fashionMNIST.py		02_pytorch_fashionMNIST.py
03_q_learning_gym_taxi_v3		03_q_learning_gym_taxi_v3
04_dqn_gym_cartpole.py		04_dqn_gym_cartpole.py
05_ddpg_gym_pendulumv1.py		05_ddpg_gym_pendulumv1.py
README.md		README.md