Reinforcement Learning Algorithms

Introduction

This repository includes implementations of the following algorithms:

Deep Q-Learning: Utilizing experience replay and target networks.
Multi-Armed Bandits: Including strategies like epsilon-greedy and Upper Confidence Bound (UCB).
N-step Tree Backup: Implementation for n-step bootstrapping.
Off-Policy Learning: Algorithms such as Q-learning.
On-Policy Learning: Methods like SARSA.
Thompson Sampling: Bayesian approach for balancing exploration and exploitation.
Expected SARSA: An enhancement over SARSA with expected rewards.
Gradient Preference-Based Methods: Various policy gradient algorithms.
Policy Iteration: Classical dynamic programming algorithm for solving MDPs.