A Deep Deterministic Policy Gradient Actor-Critic reinforcement learning solution to the Unity-ML(Udacity) Reacher environment.
Reacher is an environment in which 20 agents control a double-jointed arm each to move to a target location. The target (goal location) is moving and each agent receives a reward of +0.1 for each step that the agent's hand is in the goal location. Thus, the goal of each agent is to maintain its position at the target location for as many time steps as possible.
Set-up: Double-jointed arm which can move to target locations.
Goal: The agents must move its hand to the goal location, and keep it there.
Agents: The environment contains 20 agent with same Behavior Parameters.
Agent Reward Function (agent independent): +0.1 Each step agent's hand is in goal location.
- Vector Observation space (State Space): 26 variables corresponding to position, rotation, velocity, and angular velocities of the two arm rigid bodies.
- Actions (Action Space): 4 continuous actions, corresponding to torque applicable to two joints.
Benchmark Mean Reward: 30 Turns: An episode completes after 1000 frames
To set up your python environment and run the code in this repository, follow the instructions below.
Create (and activate) a new environment with Python 3.6.
- Linux or Mac:
conda create --name ddpg-rl python=3.6
source activate ddpg-rl
- Windows:
conda create --name ddpg-rl python=3.6
activate ddpg-rl
Clone the repository and install dependencies
git clone https://github.com/kotsonis/ddpg-reacher.git
cd ddpg-reacher
pip install -r requirements.txt
-
Download the environment from one of the links below. You need only select the environment that matches your operating system:
- Linux: click here
- Mac OSX: click here
- Windows (32-bit): click here
- Windows (64-bit): click here
(For Windows users) Check out this link if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system.
-
Place the file in the
ddpg-reacher
folder, and unzip (or decompress) the file. -
edit config.py to and set the
self.reacher_location
entry to point to the right location. Example :
self.reacher_location = './Reacher_Windows_x86_64/Reacher.exe'
alternatively you can pass the environment location to train.py
with the reacher_location=
argument
To train an agent, train.py reads the hyperparameters from config.py and accepts command line options to modify parameters and/or set saving options.You can get the CLI options by running
python train.py -h
to run a training with the parameters that produced a solution, you can run:
python train.py --save-model_dir=model --output-image=reacher_v3.png --episodes=200 --batch-size=256 --eps-decay=0.99 --n_step=7
you can see the agent playing with the trained model as follows:
python play.py
You can also specify the number of episodes you want the agent to play, as well as the non-default trained model location as follows:
python play.py --episodes 20 save-model_dir=./new_reacher
You can read about the implementation details and the results obtained in Report.md