Use OpenAI Baselines with Dart Env

Use OpenAI Baselines with DartEnv

Tested for Python3 on Ubuntu 14.04, OSX 10.12.

Virtual Environments

For better managing the python packages, it is recommended to use virtual environments either through virtualenv or anaconda.

virtualenv

Install virtual env via:

    pip install virtualenv

Create a virtual environment:

    virtualenv /path/to/venv --python=python3

Activate the virtual environment:

    . /path/to/venv/bin/activate

anaconda

Anaconda manages various things including virtual environments, packages, notebooks, etc. However, it may have conflicts with homebrew on mac osx. So use with caution when you are trying to install both anaconda and homebrew. To setup virtual envs with anaconda, use the following steps:

1.Download and install Anaconda for Python 3.6 from: https://www.continuum.io/downloads

2.Create virtual environment:

    conda create --name ENV_NAME python=3.6

3.Activate the virtual environment:

    source activate ENV_NAME

Install Dart Env

Please refer to https://github.com/DartEnv/dart-env/wiki for instructions on installing Dart Env.

Install OpenAI Baselines

Detailed instructions can be found in the original repository of Baselines. Here we list the key command line commands:

Prerequisite for Mac OSX

    brew install cmake openmpi

Prerequisite for Ubuntu

    sudo apt-get update && sudo apt-get install cmake libopenmpi-dev python3-dev zlib1g-dev

Download and install OpenAI Baselines

    git clone https://github.com/openai/baselines.git
    cd baselines
    pip install -e .

Example: train a single-legged robot to hop forward

We provide an example of training a single-legged robot to move forward using Proximal Policy Optimization (PPO) algorithm. To perform training, first create a new file under the baselines root directory, say we name it run_dart.py. Then copy the following code into the file:

from baselines.common.cmd_util import make_mujoco_env, mujoco_arg_parser
from baselines.common import tf_util as U
from baselines import logger

def callback(localv, globalv):
    import joblib
    if localv['iters_so_far'] % 10 != 0:
        return
    save_dict = {}
    variables = localv['pi'].get_variables()
    for i in range(len(variables)):
        cur_val = variables[i].eval()
        save_dict[variables[i].name] = cur_val
    joblib.dump(save_dict, logger.get_dir()+'/policy_params_'+str(localv['iters_so_far'])+'.pkl', compress=True)
    joblib.dump(save_dict, logger.get_dir() + '/policy_params' + '.pkl', compress=True)

def train(env_id, num_timesteps, seed):
    from baselines.ppo1 import mlp_policy, pposgd_simple
    U.make_session(num_cpu=1).__enter__()
    def policy_fn(name, ob_space, ac_space):
        return mlp_policy.MlpPolicy(name=name, ob_space=ob_space, ac_space=ac_space,
            hid_size=64, num_hid_layers=2)
    env = make_mujoco_env(env_id, seed)
    pposgd_simple.learn(env, policy_fn,
            max_timesteps=num_timesteps,
            timesteps_per_actorbatch=4000,
            clip_param=0.2, entcoeff=0.0,
            optim_epochs=10, optim_stepsize=3e-4, optim_batchsize=64,
            gamma=0.99, lam=0.95, schedule='linear',callback=callback,
        )
    env.close()

def main():
    args = mujoco_arg_parser().parse_args()
    logger.configure('data/ppo_'+args.env+'_results')
    train(args.env, num_timesteps=args.num_timesteps, seed=args.seed)

if __name__ == '__main__':
    main()

Then run:

    mpirun -np 2 python run_dart.py --env DartHopper-v1 --seed 0

You should find a folder named ppo_DartHopper-v1_results inside the data folder. You can find the learning progress in progress.csv, and policy files at different learning iterations as policy_params_ITER.pkl, where ITER is the iteration number. For this example, it should be able to get 2k+ total reward (EpRewMean) at the end of the training.

Finally, to visualize the policy controlling the simulated robot, first create a new file for the testing code, then copy the following code into the file:

import gym, sys, joblib, numpy as np, tensorflow as tf
from baselines.common import set_global_seeds, tf_util as U
from baselines.ppo1 import mlp_policy

if __name__ == '__main__':
    env = gym.make(sys.argv[1])
    if hasattr(env.env, 'disableViewer'):
        env.env.disableViewer = False

    sess = tf.InteractiveSession()

    policy = None
    if len(sys.argv) > 2:
        policy_params = joblib.load(sys.argv[2])
        policy = mlp_policy.MlpPolicy(name="pi", ob_space=env.observation_space, ac_space=env.action_space, hid_size=64, num_hid_layers=2)
        U.initialize()
        cur_scope = policy.get_variables()[0].name[0:policy.get_variables()[0].name.find('/')]
        orig_scope = list(policy_params.keys())[0][0:list(policy_params.keys())[0].find('/')]
        vars = policy.get_variables()
        for i in range(len(policy.get_variables())):
            assign_op = policy.get_variables()[i].assign(policy_params[policy.get_variables()[i].name.replace(cur_scope, orig_scope, 1)])
            sess.run(assign_op)

    traj_num, rew, ct, d = 1, 0, 0, False
    o = env.reset()
    while ct < traj_num:
        if policy is not None:
            ac, vpred = policy.act(False, o)
            act = ac
        else:
            act = env.action_space.sample()
        o, r, d, env_info = env.step(act)
        rew += r
        env.render()
        if d:
            step = 0
            ct += 1
            print('reward: ', rew)
            o=env.reset()
    print('avg rew ', rew / traj_num)

Assuming that you named the file test_policy.py and put it under baselines directory, you can then run the following commands to visualize the hopper policy you just trained:

    python test_policy.py DartHopper-v1 data/ppo_DartHopper-v1_results/policy_params.pkl

You should be able to see the hopper hopping forward. Here is a video of what it might look like (with different camera angle).

Additional note:

If you see error similar to the following:

*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)

try running:

    pip install mpi4py==2.0.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly