This project presents our unique implementation of Deep Recurrent Q-Learning (DRQL) that incorporates Transfer Learning for feature extraction, a customized LSTM for temporal recurrence, and a domain-informed reward function. This tailored approach aims to expedite convergence compared to the vanilla implementation outlined in the original paper. The performance evaluation focuses on two adaptive Atari 2600 games: Assault-v5 and Bowling, where game difficulty scales with player proficiency. Comparative analysis between the convergence of our optimized reward function and the vanilla version is conducted using StepLR and CosineAnnealingLR learning rate schedulers, complemented by theoretical explanations. Additionally, an efficient windowed episodic memory implementation employing bootstrapped sequential updates is proposed to optimize GPU memory utilization
Assault-v5 | Bowling |
---|---|
python3 -m venv mlproj
source mlproj/bin/activate
pip install -r requirements.txt
Link to Jupyter Notebook Detailed Report with Code, Experimentation and Results
- Rohan Kalbag
- Vansh Kapoor
- Sankalp Bhamare