policy-autoencoder

This is a simple policy auto-encoder model that learns approximation of state transition and policy functions. These functions are deterministic versions of the corresponding MDP probability distributions. The state transition can be used in model based reinforcement learning for planning complex behaviors. The policy function provides elementary actions from the current state to the next desired state. An agent just "imagines" a next state it wants to be in and apply this function to get there.

The test environment is a grid world consisting of an dot agent with 9 actions: up, up-right, right, down-right, down, down-left, left, up-left, and stop. The training data set is randomly generated initial states, actions and next states, e.g. a sample for 4x4 gird-world:

Initial state	Action	Next state
	Move right

The model consists of two modules:

Encoder that accepts initial state and action and outputs a next state;
Decoder that takes initial state, next state and outputs an action.

The trained model can be decoupled in to:

Encoder module used for prediction of next state - state transition function;
Decoder module used to provide elementary actions to achieve desired state - policy function.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
README.md		README.md
dataset.lua		dataset.lua
main.lua		main.lua

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

policy-autoencoder

About

Releases

Packages

Languages

akolishchak/policy-autoencoder

Folders and files

Latest commit

History

Repository files navigation

policy-autoencoder

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages