Skip to content

A Markov Decision Process with large number of states and its solvers (value iteration, policy iteration and Q-learning)

Notifications You must be signed in to change notification settings

fjia30/CancerRadiationMDP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

CancerRadiationMDP

A Markov Decision Process (MDP) with large number of states and its solvers (value iteration, policy iteration and Q-learning)

Introduction

This problem is partially inspired by Kim M et al. When doing radiation therapy to treat cancers, the outcome is evaluated by the eradication of tumor(s) and the damage to the surrounding tissues, also known as Organs at Risk (OARs). OARs are more resistant to radiation and an optimal treatment plan needs to ensure tumor eradication and minimal OAR damage. How to choose different doses of radiation at different stages of treatment in response to different levels of tumor shrinkage and tissue damage is key to the success of radiation therapy.

In our model, the patient can have 1-9 tumors and 1-9 OARs. Each tumor and OAR can have a size of 0-9. The sizes of each tumor and OAR define the states of the MDP. A problem with 1 tumor of size 6 and 3 OARs of size 4 has ~200 states. A problem with 2 tumors of size 8 and 5 OARs of size 5 has ~11,000 unique states.

The radiation can have 3 levels by default, 0 (low), 1 (medium) and 2 (high) which defines the action space.

The transition function T is defined by the probability of a tumor or OAR to remain the same or reduce size by 1 during a treatment session (a step) under different levels of radiations. There is a negative reward when an OAR reaches 0 size (tissue loss) and a positive reward when a tumor reaches 0 size (tumor removal). This model is illustrated in the figure below.

CancerRadiationMDP

Package content

class name description
CancerRadiationMDP.py Simulates the MDP problem
CancerRadiationUtility.py A utility class used by the others
CancerRadiationValIter.py A solver of the MDP using value iteration
CancerRadiationPolIter.py A solver of the MDP using policy iteration
CancerRadiationQLearn.py A solver of the MDP using Q-learning
CancerRadiationAnalyzer.py Analyze the and compares different solvers

Note

This model can be extended to other problems. In general, we define some targets (tumors in our case) and some off-targets (OARs in our case). The "sizes" of the targets and off-targets define their progress. The transition probabilities define the probabilities to progress during each step for targets and off-targets. There is a positive reward to remove each target (size bacomes 0) and a negative reward to remove each off-target. It is also possible to add a small penality or price at each step.

About

A Markov Decision Process with large number of states and its solvers (value iteration, policy iteration and Q-learning)

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages