The Double Inverted Pendulum consists of two joint pendulums connected to a cart that is moving on a track, the agent needs to keeps in equilibrium the Double Inverted Pendulum by interacting with the environment by applying an horizontal force on a cart. At each transition, additionally to the reward given by the environment, the agent receives a positive, constant signal. A terminal state of the environment is reached when the distance between the upright and the current state is above a given thresold.
Agent trained for 500 epochs using DDPG algorithm and gamma set to 0.99
More informations concerning the implementation and method in the report.
- Fitted Q-iteration:
fqi
- Deep Q-learning:
dql
- Deep Deterministic Policy Gradient:
ddpg
Make sure to have installed pybullet-gym before using the program.
python main.py [-h] [--ddpg] [--fqi] [--dql] [--batchnorm] [--render RENDER] [--gamma GAMMA] [--samples SAMPLES] [--actions ACTIONS] [--seed SEED]
RENDER
should be a file toward a saved model for either dql or ddpg
→ This will render the double pendulum with the given pretrained model.
GAMMA
is the discount factor
→ 0.99 give the best results
SAMPLES
are the number of samples used when training fqi
→ Higher is the better but 200k give reasonable good results (computation expensive)
ACTIONS
number of discrete actions when using either dql or fqi
→ Should be an odd number
SEED
the seed to use
python main.py --ddpg --gamma 0.99 --render saved_models/DDPG
python main.py --dql --gamma 0.99 --actions 11