This repository provides executable codes for the paper A Scalable Approach to Control Diverse Behaviors for Physically Simulated Characters, which was published in SIGGRAPH 2020. Click here to see the paper. The name of the project originates from the two keywords in the name of the paper, which are SCAlable and DIVERse.
Please note that we are not able to guarantee that the codes will work if it is run with settings that we did not intend when we developed for the project. We would appreciate it if users could report the problems when they find it.
@article{
ScaDiver,
author = {Won, Jungdam and Gopinath, Deepak and Hodgins, Jessica},
title = {A Scalable Approach to Control Diverse Behaviors for Physically Simulated Characters},
year = {2020},
issue_date = {July 2020},
volume = {39},
number = {4},
url = {https://doi.org/10.1145/3386569.3392381},
journal = {ACM Trans. Graph.},
articleno = {33},
}
Although we only confirmed that the codes work correctly in Ubuntu (18.04) environment, probably it will work in other environments without much troubles if it is installed in a fresh Python virtual environment, which we recommend users to use to prevent from mixing different pakage versions.
git clone https://github.com/fairinternal/ScaDiver
fairmotion provides functionalities to process motion capture data and, to compute kinematics, to visualize simulated characters and environments.
pip install fairmotion
We use PyBullet for physics simulation and rllib for reinforcement learning.
pip install pybullet==2.7.3 ray[rllib]==0.8.7 pandas requests
In this example, a humanoid character will be loaded into space where only the ground plane exists, and it will be simulated without any control. Please press a to simulate and press r to reset.
python env_humanoid_tracking.py
In this example, a humanoid character will be loaded into space where only the ground plane exists, and its initial states will be set by some motion capture data specified in the specification file (test_env_humanoid_imiation.yaml). Please note that the character will fall immediately because the current controller is the one not trained at all.
python rllib_driver.py --mode load --spec data/spec/test_env_humanoid_imitation.yaml --project_dir ./
To learn expert controllers for a set of heterogeneous motion capture data, we first need to cluster the data according to some motion features. For this, please refer to an example in fairmotion about motion clustering. After clustering the data, each expert controller can be learned by using only data involved in each cluster.
python rllib_driver.py --mode train --spec data/spec/spec_env_humanoid_imitation_expert0.yaml --project_dir ./ --local_dir ./data/learning/experts/
python rllib_driver.py --mode train --spec data/spec/spec_env_humanoid_imitation_expert1.yaml --project_dir ./ --local_dir ./data/learning/experts/
...
python rllib_driver.py --mode train --spec data/spec/spec_env_humanoid_imitation_expert7.yaml --project_dir ./ --local_dir ./data/learning/experts/
After expert controllers are learned, they can be combined as a single mixture-of-experts controller. In our framework, this can be simply achieved by setting the type of policy model as moe as below.
python rllib_driver.py --mode train --spec data/spec/spec_env_humanoid_imitation_moe.yaml --project_dir ./ --local_dir ./data/learning/moe/
When the training process is performed, it will automatically create logs by using Tensorboard in the local_dir/project_name. The status can be checked by the script below.
tensorboard --logdir="directory_where_tensorboard_log_exists"
According to the iteration number specified in the specification file, it will automatically save intermediate results in the local_dir/project_name/checkpoint/. Testing the controllers can be done as below.
python rllib_driver.py --mode load --spec "some_specification_file" --project_dir ./ --checkpoint "some_saved_checkpoint_file"
Every experiment requires a unique specification file (yaml) that includes all the settings about individual experiments. Most of the elements are easy to understand by its name and it is based on rllib's configuration file. We will explain only our environment-specific settings below, please refer to the documentation of rllib for other settings.
Because the optimality of learned policy is defined by the definition of reward function in reinforcement learning, changing the function is the only way for users to design the policy so that it has desirable properties/behaviors. As a result, testing many reward functions by combining various terms with different combinations is a critical process in many researches including ours. We implemented a flexible way to test various functions by simply modifying a specification file.
Our reward function is defined in a tree-like manner, where each node could be a operation among child nodes or a leaf that defines a term. For example, the function below is a multiplicative reward function composed of the five terms (pose_pos, pose_vel, ee, root, and com), where each term has its own kernel function. Currently, we only support gaussian and none kernel functions. By simply changing the operation of the root node from mul into add, we can test an additive reward function. If we want to change weight values for the terms, we can simply change weight in the leaf nodes.
We can also define multiple reward functions in the same specification files (please note that their names should be unique), then choose one of them according to experiments by setting the name which we want to test in fn_map.
reward:
fn_def:
default:
name: total
op: mul
child_nodes:
- name: pose_pos
op: leaf
weight: 1.0
kernel:
type: gaussian
scale: 40.0
- name: pose_vel
op: leaf
weight: 1.0
kernel:
type: gaussian
scale: 1.0
- name: ee
op: leaf
weight: 1.0
kernel:
type: gaussian
scale: 5.0
- name: root
op: leaf
weight: 1.0
kernel:
type: gaussian
scale: 2.5
- name: com
op: leaf
weight: 1.0
kernel:
type: gaussian
scale: 2.5
fn_map:
- default
We can use various early termination strategies. For example, we can terminate the current episode when the character falls down, or the average reward for 1s is below a specific threshold as used in our paper.
early_term:
choices: # 'sim_div', 'sim_window', task_end', 'falldown', 'low_reward'
- task_end
- low_reward
low_reward_thres: 0.1
eoe_margin: 0.5
The character fields include character-specific information.
character:
name:
- humanoid
char_info_module:
- amass_char_info.py
sim_char_file:
- data/character/amass.urdf
ref_motion_scale:
- 1.0
base_motion_file:
- data/motion/amass_hierarchy.bvh
ref_motion_db:
-
data:
file:
- data/motion/test.bvh
actuation:
- spd
self_collision:
- true
environment_file: []
If true, the environment will be fully created when reset is called. This is useful when the creation of the environment is expensive. Please search with the keyword Expensive Environments here for more details.
lazy_creation: false
When the codes run, we assume that the current working directory (cwd) is the root directory where the project is originally saved. However, there exist some cases when the cwd could be changed by other external libraries such as rllib, so specifying project_dir is necessary to keep the root directory of the project.
project_dir: /home/jungdam/Research/opensource/ScaDiver/
Time steps for physics simulation
fps_sim: 480
Time steps for control policies
fps_con: 30
If true, small noise will be added to th the initial state of the character
add_noise: false
If true, some additional information will be printed. Please use this only for testing environments.
verbose: false
This defines which information will be added to the state of the character. 'body' will include the physical state of the simulated character such as joint angles and joint angular velocities, 'task' will include the difference between the simulated character and the currently playing reference motion.
state:
choices: ['body', 'task'] # 'body', 'imitation', 'interaction', 'task'
This includes information on the action space for our environment. Basically, the action space is a target posture that includes a set of target joint angles except for the root joint. 'range_min/range_max' defines the range of joint angles in radian for which we will use, and 'range_min_pol/range_max_pol' defines the range of output values of the policy (controller). In other words, there exists a linear mapping from (range_min_pol, range_max_pol) to (range_min, range_max). For example in the setting below, the full range of motion of our simulated character is between -3.0 radian and 3.0 radians from its base posture (t-pose), and if the policy outputs 3.0 for a specific joint, the environment will interpret the value as 3.0 radians. This feature is especially useful when we want to restrict the range of motion of the simulated character (range_min/range_max will be useful), or the policy has special output layers such as tanh (range_min_pol/range_max_pol will be useful).
action:
type: "absolute" # 'absolute', 'relative'
range_min: -3.0
range_max: 3.0
range_min_pol: -3.0
range_max_pol: 3.0
Because our framework is based on rllib, it can run not only on a single machine with multiple cores but also on clusters such as AWS, Azure, or customized one managed by Slurm. Please refer to this for more detail.
To be updated.
ScaDiver is released under the BSD-3-Clause License.