a simple tabular Q-learning using epsilon-greedy on frozen ice openAI gym environment.
The red line represent the evolution of epsilon value over time. The blue line represent the average accuracy on goal-reaching task for the last 20 episodes. The x axis represent the episode id + 20.