This repository includes implementations of the following algorithms:
- Deep Q-Learning: Utilizing experience replay and target networks.
- Multi-Armed Bandits: Including strategies like epsilon-greedy and Upper Confidence Bound (UCB).
- N-step Tree Backup: Implementation for n-step bootstrapping.
- Off-Policy Learning: Algorithms such as Q-learning.
- On-Policy Learning: Methods like SARSA.
- Thompson Sampling: Bayesian approach for balancing exploration and exploitation.
- Expected SARSA: An enhancement over SARSA with expected rewards.
- Gradient Preference-Based Methods: Various policy gradient algorithms.
- Policy Iteration: Classical dynamic programming algorithm for solving MDPs.