Skip to content

FASE MIE DORL-Undergrad for Team Robtic Control. Modified OPOLO code for implementation for GARAT approach which will train action transformation function.

Notifications You must be signed in to change notification settings

kentwhf/opolo-code

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

GARAT: Generative Adversarial Reinforced Action Transformation

Research code refered to the paper: An Imitation from Observation Approach to Transfer Learning with Dynamics Mismatch.

Project Scope:

We work on a Sim2Real/Sim2Sim Interface to resolve dynamics mismatch for find and touch tasks. Specifically, we are using the GARAT framework to learn action transformation policy (ATP) by an imitation learning method (TRPOGAIFO) based on target environment samples, and then updating target policy to be deployed at target environment through RL algorithms (DDPG + HER).


Branch:

  • master: all of our changes since OPOLO
  • temp: temporary branch with miscellaneous updates that are not consistent with the current git log, but may show clear our immediate changes

Training GARAT:

  • Example: run on the FetchReach-v1 task
  • Obtain a source policy from running DDPG+HER in FetchReach-v1, opolo-baselines/run/test.zip in our case
  • Collect demonstrated trajectories by function generate_target_traj(rollout_policy_path, env, save_path, n_episodes, n_transitions, seed)
    • Function may be called and executed in training an ATP
    • One can reduce the dimensionality of samples used
  • Train an ATP by opolo-baselines/run/train_agent_custom.py
  • Update target policy to be deployed at target environment by opolo-baselines/simulation_grounding/train_target_policy.py

Evaluating ATP:

python opolo-baselines/simulation_grounding/plot_state_distributions.py

Results can be found at: opolo-baselines/atp_plots/


Reminders:

  • Please contact or read our final report for more details

  • For Windows users, WSL2 is recommended for the purpose of using the mujoco library

  • One can tune hyperparameters for the used grounding algorithm at:

opolo-baselines/hyperparams/
  • We use a script to convert raw text to desired trajectory format of numpy dictionary, due to unstable connection with our UArm Swift robotic arm.
python opolo-baselines/sim_2_real/data_processing.py
  • ATPs can be found at:
opolo-baselines\run\test\logs\trpo-gaifo\trpogaifo\FetchReach-v1

where rank0 is of full-length samples and gamma = 0.95, rank1 is of reduced-length samples and gamma = 0.95, and rank1 is of reduced-length samples and gamma = 0.1

About

FASE MIE DORL-Undergrad for Team Robtic Control. Modified OPOLO code for implementation for GARAT approach which will train action transformation function.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 97.4%
  • Rich Text Format 2.4%
  • Other 0.2%