Constructed a 10-armed testbed as described Section 2.3 of the book Reinforcement Learning – An Introduction by Richard S. Sutton and Andrew G. Barto (2nd edition).
Compared the following three methods of action value estimation:
- epsilon-greedy action selection
- Optimistic initial value
- Upper-Confidence-Bound Action selection
Varied the parameters present in each of the above three action value estimation methods and analysed its affect on the % of the times optimal action is selected and the average reward. The report contains a complete anaylsis of the comparisons and performance along with the graphs generated.