This is an official GitHub Repository for paper "Stage-Wise Reward Shaping for Acrobatic Robots: A Constrained Multi-Objective Reinforcement Learning Approach".
- python==3.7
- torch==1.12.1
- numpy==1.21.5
- isaacgym (https://developer.nvidia.com/isaac-gym)
- IsaacGymEnvs (https://github.com/isaac-sim/IsaacGymEnvs)
- ruamel.yaml
- requests
- pandas
- scipy
- wandb
Stage-Wise-CMORL/
└── algos/
│ └── common/
│ └── comoppo/
│ └── student/
└── assets/
│ └── go1/
│ └── h1/
└── tasks/
└── utils/
└── main_student.py
└── main_teacher.py
algos/
: contains the implementation of the proposed algorithmassets/
: contains the assets of the robotstasks/
: contains the implementation of the tasksutils/
: contains the utility functions
- GO1 Robot (Quadruped from Unitree)
- Back-Flip
- Side-Flip
- Side-Roll
- Two-Hand Walk
- H1 Robot (Humanoid from Unitree)
- Back-Flip
- Two-Hand Walk
It is required to train a teacher poicy first, and then train a student policy using the teacher policy.
- training:
python main_teacher.py --task_cfg_path tasks/{task_name}.yaml --algo_cfg_path algos/comoppo/{task_name}.yaml --wandb --seed 1
- test:
python main_teacher.py --task_cfg_path tasks/{task_name}.yaml --algo_cfg_path algos/comoppo/{task_name}.yaml --test --render --seed 1 --model_num {saved_model_num}
- training:
python main_student.py --task_cfg_path tasks/{task_name}.yaml --algo_cfg_path algos/student/{task_name}.yaml --wandb --seed 1
- test:
python main_student.py --task_cfg_path tasks/{task_name}.yaml --algo_cfg_path algos/student/{task_name}.yaml --test --render --seed 1 --model_num {saved_model_num}