MDP-DynamicProg

Policy Iteration, Value Iteration and Prioritized Sweeping for simple grid world MDP control.

Grid World:

Rewards:

00  00 00 ... 00  00  00
00 -10 00 ... 00 -10  00
00  00 00 ... 00  00  00
          ...
00  00 00 ... 00  00  00
00  00 00 100 00  00  00
00  00 00 ... 00  00  00
          ...
00  00 00 ... 00  00  00
00 -10 00 ... 00 -10  00
00  00 00 ... 00  00  00

5 terminal states: 4 negative corners, 1 positive center

Actions:

Left(0) | Right(1) | Up(2) | Down(3)

Usage:

python main.py <algorithm> <OPTIONAL_FLAGS>

<OPTIONAL_FLAGS> are:

--gamma (default 0.9, must be in (0.0, 1.0) )
--width (default 5, must be odd integer in (5, 101) )

<algorithms> are:

Policy Iteration algorithm: python main.py policy_iteration <OPTIONAL_FLAGS>
Value Iteration algorithm: python main.py value_iteration <OPTIONAL_FLAGS>
Prioritized Sweeping algorithm: python main.py prioritized_sweeping <OPTIONAL_FLAGS>

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
README.md		README.md
main.py		main.py
presentation2.pdf		presentation2.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MDP-DynamicProg

Grid World:

Usage:

About

Releases

Packages

Languages

NicolasAG/MDP-DynamicProg

Folders and files

Latest commit

History

Repository files navigation

MDP-DynamicProg

Grid World:

Usage:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages