Reinforcement Learning agents and gym-style environment for the Solo legged robot.
- Numpy
- Gym
- Pybullet
- PyTorch
- solo.py: Robot class contains the basic description of the robot. Contains the state and action spaces.
- simulation.py: Scene class to define the contents of the scene and some visual effects.
- baseEnv.py: Gym-style wrapper for the robot for easier interface with the RL agents. Contains the reward definition.
- Stand: Balance the robot in a standing posture at above height.
- Walk: The robot should move along a certain direction.
- PointGoal: The robot should navigate towards a specific (x,y) location on the map
Run a toy instance:
# Requires Pudb debugger; pip install pudb
$ python main.py --mode gui --config-file ./configs/basic.yaml
Configuration files in the config directory are stored in a yaml d format and define the task, robot (solo8 or solo12), scene information, episode length, control method and etc...
Three baseline rl agents are implemented for training PPO, TD3 and SAC (sac is still not working quite well).
Train PPO agent
$ python -u ./training/train_ppo.py --num-agents 64 --logdir $LOGDIR/ --log-interval 5 --save-interval 50 --lr 0.00025 --entropy-coef 0.01 --clip-param 0.1 --ppo-epoch 5 --mini-batch-size 512 --clip-value-loss --config-file ./configs/basic.yaml --use-gae --use-linear-lr-decay --seed 1
Train TD3 agent
$ python -u ./training/train_td3.py --num-agents 64 --logdir $LOGDIR/ --log-interval 1000 --save-interval 2000 --config-file ./configs/basic.yaml --seed 1
$LOGDIR
variable contains the directory in which checkpoints of the trained models and tensorboard visulizations are stored
Run trained models
# test ppo agents
$ python ./testing/test_ppo.py --checkpoint-dir $LOGDIR --mode gui --config-file ./configs/basic.yaml
# test td3 agents
$ python ./testing/test_td3.py --checkpoint-dir $LOGDIR --mode gui --config-file ./configs/basic.yaml
PPO agents:
stand | walk | pointGoal |
---|---|---|
TD3 agents:
stand | walk | pointGoal |
---|---|---|