Skip to content

This project utilizes Markov Decision Process (MDP) principles to implement a custom "CliffWalking" environment in Gym, employing policy iteration to find an optimal policy for agent navigation.

Notifications You must be signed in to change notification settings

SheidaAbedpour/MDP-CliffWalking

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning Project: CliffWalking

This project implements a reinforcement learning environment called "CliffWalking," which is a variation of the classic Cliff Walking problem. The environment is designed as a subclass of CliffWalkingEnv from the Gym library. The project includes functionalities for policy evaluation and policy iteration within the Markov Decision Process (MDP) framework.

MDP

MDP stands for Markov Decision Process. It is a mathematical framework used to model decision-making problems in situations where outcomes are partly random and partly under the control of a decision-maker.

In an MDP, the decision-making problem is represented as a tuple (S, A, P, R), where:

  • S is the set of possible states in the environment.
  • A is the set of possible actions that the decision-maker can take.
  • P is the state transition probability matrix, which defines the probability of transitioning from one state to another when a particular action is taken.
  • R is the reward function, which assigns a numerical reward to each state-action pair.

The goal is to find an optimal policy that maximizes the expected cumulative reward over time.

Policy Evaluation and Policy Iteration

The project implements policy evaluation and policy iteration algorithms for solving the CliffWalking environment. Policy evaluation estimates the value function for a given policy, while policy iteration alternates between policy evaluation and improvement to find the optimal policy in an MDP.

Environment: CliffWalking

The implemented environment in this project called "CliffWalking" is a variation of the classic Cliff Walking problem. The environment is implemented as a subclass of CliffWalkingEnv from the gym library.

Attributes

  • UP, RIGHT, DOWN, LEFT: Constants representing possible actions.

Methods

  • init(self, is_hardmode=True, num_cliffs=10, *args, **kwargs): Constructor method initializing the environment.
  • _calculate_transition_prob(self, current, delta): Helper method for calculating transition probabilities.
  • is_valid(self): Depth-first search (DFS) method to check for a valid path.
  • step(self, action): Overrides the step method for taking actions and returning state, reward, and termination status.
  • _render_gui(self, mode): Method for rendering the environment using the pygame library.

How to Run

  1. Clone the Repository:
https://github.com/SheidaAbedpour/MDP-CliffWalking.git
  1. Install Dependencies:
pip install -r requirement.txt
  1. Run project:
python main.py
  1. View the results, including the optimal policy and corresponding values.

Acknowledgments

This project is based on the CliffWalking environment from the Gym library. The project structure and documentation follow best practices and guidelines.

About

This project utilizes Markov Decision Process (MDP) principles to implement a custom "CliffWalking" environment in Gym, employing policy iteration to find an optimal policy for agent navigation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published