A novel approach to Guided Domain Randomization through the implementation of an Adversarial Agent, trained to confuse the Task Agent in order to produce more robust results. Read the paper discussing the ideas, implementation and results here. Team members: Simone Carena, Francesco Paolo Carmone, Ludovica Mazzucco
Official assignment at Google Doc.
Start a virtual environment with Python 3.10.0, then install all the dependencies with
pip install -r requirements.txt
This code has been developed and tested using Arch Linux x86_64, kernel 6.7.6-arch1-2, and Wayland. Alternatively, the .ipynb
are platform agnostic and can be used interchangeably.
Available gym environments
- Source: the hopper, whose mass torso value is shifted by one.
- Target: the hopper
- UDR: the hopper, whose mass torso value is shifted by one. Once the reset function is called, the hopper is assigned new masses uniformly and randomly generated
- Deceptor: hopper, whose mass torso value is shifted by one. Once the reset function is called, the hopper, but the masses are generated by the Adversarial Agent
An example of the standard plot nomenclature is source -> target
. That means that the agent has been trained on source, and is later tested on target.
File description
task2.py
trains source.mdl
task3.py
trains target.mdl, and and tests source -> source, source -> target and target -> target. Please note: if there's a file named target.mdl
, it just tests it.
task4.py
trains dr_model.mdl, and tests drsource -> target. Please note: if there's a file named dr_model.mdl
, it just tests it.
train_adversarial_agent.py
: Trains the adversarial agent and saves it as deception_model_agent_dr. Run with --help
to check all the available commands.
train_multi_adversarial_agents.py
: Trains 5 different models, with seeds ranging from 1 to 5, and saves them. Run with --help
to check all the available commands.
test_models
: Tests the following cases: source -> target, target -> target, drsource -> target, and deceptorsource -> target