GitHub - jezzarax/drlnd_p2_continuous: Solution of a first project of the deep reinforcement learning nanodegree at Udacity.

Solution to the first project of the deep reinforcement learning nanodegree at Udacity.

Problem definition

In this environment, a double-jointed arm can move to target locations through a 3d space. A reward of +0.1 is provided for each step that the agent's hand is in the goal location. Thus, the goal of your agent is to maintain its position at the target location for as many time steps as possible.

The observation space consists of 33 variables corresponding to position, rotation, velocity, and angular velocities of the arm. Each action is a vector with four numbers, corresponding to torque applicable to two joints. Every entry in the action vector should be a number between -1 and 1.

Usage

Preparation

Please ensure you have Pipenv installed. Clone the repository and use pipenv --three install to create yourself an environment to run the code in. Otherwise just install the packages mentioned in Pipfile.

Due to the transitive dependency to tensorflow that comes from unity ml-agents and the bug causing incompatibility to jupyter you might want to either drop the jupyter from the list of dependencies or run pipenv --three install --skip-lock to overcome it.

To activate a virtual environment with pipenv issue pipenv shell while in the root directory of the repository.

After creating and entering the virtual environment you need to set a DRLUD_P2_ENV shell environment which must point to the binaries of the Unity environment. Example of for Mac OS version of binaries it might be

DRLUD_P2_ENV=../deep-reinforcement-learning/p2_navigation/Reacher.app; export DRLUD_P2_ENV

Details of downloading and setting of the environment are described in Udacity nanodegree materials.

Training (easy way)

Just follow the training notebook.

Training (proper way)

The executable part of code is built as a three-stage pipeline comprising of

training pipeline
analysis notebook
demo notebook

The training pipeline was created with the idea of helping the researcher to keep track of his experimentation process as well as keeping the running results. The training process is spawned by executing the trainer.py script and is expected to be idempotent to the training results, i.e. if the result for a specific set of hyperparameters already exists and persisted, the trainer will skip launching a training session for this set of hyperparameters.

The sets of hyperparameters for training are defined inside of trainer.py in the simulation_hyperparameter_reference dictionary which is supposed to be append-only in order to keep consistency of the training result data. Each of the hyperparameters sets will produce a file with a scores of number of runs of an agent which will be stored inside of ./hp_search_results directory with an id referring to the key from the simulation_hyperparameter_reference dictionary. The neural networks weights for every agent training run will be stored in the same directory with the relevant hyperparameters key as well as random seed used.

To train an agent with a new set of hyperparameters just add an item into simulation_hyperparameter_reference object. Here's an example of adding an item with id 25 after existing hyperparameter set with id 24:

simulation_hyperparameter_reference = {
###
###  Skipped some items here
###

    24: launch_parm("ddpg",  1, ddpg_parm(int(1e5), 256,  0.99, 1e-3, 1e-4, 1e-5, 0,    False, False, 1)),
    25: launch_parm("ddpg",  1, ddpg_parm(int(1e5), 256,  0.99, 1e-3, 1e-4, 2e-5, 0,    False, False, 1)),
}

The set of hyperparameters is represented as an instance of a namedtuple launch_parm which has the following set of fields:

algorithm
times
algorithm-specifgic set of hyperparameters

The algorithm field defines an implementation of an agent, currently only ddpg algorithm is supported.

DDPG-related hyperparameters are passed using ddpg_parm namedtuple and contain the following fields:

memory_size - number of experiences to persist
batch_size - numer of experiences to use during a learning session
gamma
tau
lr_actor
lr_critic
weight_decay
noise_enabled
gradient_clipping
learn_every - number of steps between learning sessions

The meaning and effects of other values for these field are discussed in the hyperparameter search notebook.

Implementation details

Two neural network architectures are defined in the models.py file.

Actor class implement a three-layer neural network that learns to project the state on to an action.
Critic class implements a three-layer neural network that learns to map state and an action into a Q-value.

Results

Please check the following notebook for the best set of hyperparameters I managed to identify.

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
hp_multi_agent_search_results		hp_multi_agent_search_results
hp_single_agent_search_results		hp_single_agent_search_results
.gitignore		.gitignore
Hyperparameter_multi_agent_search_results_exploration.ipynb		Hyperparameter_multi_agent_search_results_exploration.ipynb
Hyperparameter_search_results_exploration.ipynb		Hyperparameter_search_results_exploration.ipynb
Hyperparameter_single_agent_search_results_exploration.ipynb		Hyperparameter_single_agent_search_results_exploration.ipynb
Pipfile		Pipfile
README.md		README.md
Report.ipynb		Report.ipynb
Training.ipynb		Training.ipynb
agents.py		agents.py
models.py		models.py
trainer.py		trainer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Problem definition

Usage

Preparation

Training (easy way)

Training (proper way)

Implementation details

Results

About

Releases

Packages

Languages

jezzarax/drlnd_p2_continuous

Folders and files

Latest commit

History

Repository files navigation

Problem definition

Usage

Preparation

Training (easy way)

Training (proper way)

Implementation details

Results

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages