Gymnasium<-WERDNA RL GYM->PPO
- Fix observation space
Run Training Inference
train.py
train_advanced.py #recommended
To Run Test Model
test_model.py
test_model_advanced.py #recommended
To Run a Simple Teleoperation from Trained Agent
model_teleop.py
model_teleop_advanced.py #recommended
Tensorboard Viewing
To run tensorboard, simply run:
tensoboard --logdir logs/xxx
Can simply add one more environment under env
directory. Once added, make sure to update the train.py
or train_advanced.py
script to include ur environment in. To run training script, add one more <custom_configuration>.yaml
under config section.
Configuration For Advanced
Configuration Set up is to specify:
- Robot Model: path to the URDF file
- Environment: custom environment name
- Connect_Type: Whether to train on CPU or GPU
- Biases: Types of rewards biases to priotize during training
ec
Since the agents are trained using PPO, the entrophy coefficient is a method to stablize and encourage exploration during training to avoid either overfitting/underfitting or overtrained/undertrained conditions- Filename: The name of the trained agent's file
- Timesteps: Number of timesteps per training session
- Record Video: Whether to record video in the
kernel_pca_evaluation script
, recorded video will be saved tovideo
directory.
robot_model: "models/werdna_revised_bullet.urdf"
environment: "werdna_advanced"
connect_type: "DIRECT"
device: "cpu"
biases:
r_bias: 0.0
p_bias: 0.3
y_bias: 0.25
dR_bias: 0.0
dP_bias: 0.2
dY_bias: 0.0
x_bias: 0.25
v_bias: 0.0
ec: 0.01
filename: "werdna_advanced_v2"
timesteps: 500000
record_video: false
The results or what you would called trained agent is saved in the results
directory, but under a new directory that is named after the environment's name and biases specified