Implemention of 5 armed bandit problem with greedy and ɛ-greedy action selection algorithms. Comparing the results of ɛ -greedy action selection method (ɛ =0.4) with the greedy one.
In this experiment, we are going to implement classical 5-armed bandit problem with two selection algorithms: greedy and ɛ-greedy action selection algorithm. Basically, we want to identify the bandit machine with the highest reward and exploit it. In greedy algorithm, it always exploits current knowledge and there will be no exploration and in ɛ-greedy algorithm it continues to explore and later after time it will perform better. We implemented the algorithm in python programming language. The 5-armed bandit problem with greedy and ɛ-greedy action selection algorithm shows the balance between exploration and exploitation.