Classical-RL

This repository contains all the tabular RL algorithms from Monte-Carlo - Q Learning, most which are implemented on minigrid environment

This repo made to learn and implemented differet classical/tabular alogrithms taught by David Silver at deep mind

Environment:

Minigrid

Current Status:

Empty
Dyanamic obstacles
FourRooms

Present environments:

Empty(Empty grid)

MiniGrid-Empty-5x5-v0
MiniGrid-Empty-8x8-v0

Reward:

Everywhere reward is 0 except for the goal position which has a reward of 1.
The total amount of reward recieved in a episode is 1-0.9*steps/max_steps.

State space:

states
`env.agent_pos` (position of agent in grid)
`env.agent_dir` (direction of head of agent) (0-4)

Action space:

states
`turn right` (0)
`turn_left` (1)
`move_forward` (2)

General parameter and hyperparameters

Episodes: all the alogrithms are initially ran for 150 episodes for training the policy(this might be altered depending upon convergence of particular alogrithm)
All alogrithms follow ε-greedy policy. Except in case of Q learning base policy follows ε-greedy policy and target policy follows greedy policy)
Initially ε for each case decreases by 0.01 for every episode to ensure proper exploration vs exploitation of policy.(Might be greater for monte-carlo)
The update paramter α is set to 0.3 and works fine for all.
The discount factor γ is set to 0.9 for all cases and works for fine for all.
The parameter λ is set to 0.9 for SARSA-λ and backward-view SARSA

Rewared vs episodes

MiniGrid-Empty-8x8-v0

Monte-Carlo

SARSA/SARSA-0

SARSA-λ(Forward-view)

SARSA(Backward-View)

SARSA(Backward-View) converges to optimal policy with very little training as compared to other algorithms due to online updates.

Q Learning

Dynamic obstacles

MiniGrid-Dynamic-Obstacles-8x8-v0
MiniGrid-Dynamic-Obstacles-5x5-v0
MiniGrid-Dynamic-Obstacles-Random-5x5-v0
MiniGrid-Dynamic-Obstacles-Random-6x6-v0

Reward:

Everywhere reward is 0 except for obstacles and goal position.
If agent runs into a obstacle it get a reward of -1.
Goal position has a reward of 1
The total amount of reward recieved in a episode is 1-0.9*steps/max_steps.

State space:

states
`env.agent_pos` (position of agent in grid)
`env.agent_dir` (direction of head of agent) (0-3)
`env.grid.get(*env.front_pos)` (0-1) (if obstacle is in front of agent)

Action space:

states
`turn right` (0)
`turn_left` (1)
`move_forward` (2)

General parameter and hyperparameters

Episodes: all the alogrithms are initially ran for 600 episodes for training the policy(this might be altered depending upon convergence of particular alogrithm)
All alogrithms follow ε-greedy policy. Except in case of Q learning base policy follows ε-greedy policy and target policy follows greedy policy)
Initially ε for each case decreases by 0.002 for every episode to ensure proper exploration vs exploitation of policy.
rest all parameter are kept same as the previous environment(Empty).

Rewared vs episodes

MiniGrid-Dynamic-Obstacles-Random-6x6-v0

SARSA(Backward-View)

MiniGrid-Dynamic-Obstacles-8x8-v0

Q Learning

MiniGrid-Dynamic-Obstacles-Random-6x6-v0
Four rooms

states
`env.agent_pos` (position of agent in grid)
`env.agent_dir` (direction of head of agent) (0-4)

Action space:

states
`turn right` (0)
`turn_left` (1)
`move_forward` (2)

both the walls and goal position are fixed in order to keep the Q-table smaller in size.
Will try with random goal position which will increase state space by 3x.
Random policy:

SARSA(Backward-View)

Future environments:

Will update this when I find a interesting environment to work with (no more minigrid :) .

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
MiniGridAlgo		MiniGridAlgo
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Classical-RL

This repo made to learn and implemented differet classical/tabular alogrithms taught by David Silver at deep mind

Environment:

Minigrid

Current Status:

Present environments:

Reward:

State space:

Action space:

General parameter and hyperparameters

Rewared vs episodes

Monte-Carlo

SARSA/SARSA-0

SARSA-λ(Forward-view)

SARSA(Backward-View)

Q Learning

Reward:

State space:

Action space:

General parameter and hyperparameters

Rewared vs episodes

SARSA(Backward-View)

Q Learning

Action space:

SARSA(Backward-View)

Future environments:

Future Works:

Deep Reinforcement Learning

Basic:

About

Releases

Packages

Languages

yaswanth1701/Minigrid-RL

Folders and files

Latest commit

History

Repository files navigation

Classical-RL

This repo made to learn and implemented differet classical/tabular alogrithms taught by David Silver at deep mind

Environment:

Minigrid

Current Status:

Present environments:

Reward:

State space:

Action space:

General parameter and hyperparameters

Rewared vs episodes

Monte-Carlo

SARSA/SARSA-0

SARSA-λ(Forward-view)

SARSA(Backward-View)

Q Learning

Reward:

State space:

Action space:

General parameter and hyperparameters

Rewared vs episodes

SARSA(Backward-View)

Q Learning

Action space:

SARSA(Backward-View)

Future environments:

Future Works:

Deep Reinforcement Learning

Basic:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages