Assignment 1
- Implementation of Bernoulli and Gaussian Bandit environment using Gymnasium library and simulating them for different combinations of hyper parameters
- Implementation of different learning strategies like
pureExploitation
,pureExploration
,epsilonGreedyExploration
,decayingEpsilonGreedyExploration
,softmaxExploration
andUCBExploration
methods and their corresponding simulations on both environments along with tuning hyper parameters for different environments. - Implementation of Random Walk Environment, creation of trajectory using
generateTrajectory
function for simulation - Implementation of
MonteCarloPrediction
(both FVMC and EVMC) andTemporalDifferencePrediction
for calculation of state values in the environment - Plotting the evolution of state values over episodes, log scale episodes, seed averaged plots for effective noise removal
- Analysing the variation of target values for a particular state for the case of both environments
Assignment 2
- Implementation of control algorithms like
MonteCarloControl
,SARSAControl
,Q learning
,double Q learning
,SARSA
($\lambda$ ) with eligibility traces,Q
($\lambda$ ) with traces - Implementation of model based algorithms like
Dyna-Q
andTrajectory Sampling
for optimal policy calculation and values for each of the states in Random Maze Environment - Comparison between different off-policy and on-policy control algorithms for this environment
Assignment 3
This assignment primarily includes the implementation of 5 Value Based Deep RL models namely:
Neural Fitted Q Iteration (NFQ)
Deep Q Network (DQN)
Double Deep Q Network (DDQN)
Dueling Double Deep Q Network (D3QN)
Dueling Double Deep Q Network with Prioritized Experience Replay (D3QN-PER)
and 2 Policy Based Deep RL models namely:
REINFORCE
Vanilla Policy Gradient (VPG)
on two different OpenAI gym environments like Cartpole-v0 and MountainCar-v1 respectively.
Assignment 4
This assignment primiarily includes implementation of 3 Deep RL models for continuous action spaces namely:
Deep Deterministic Policy Gradient (DDPG)
Twin Delayed Deep Deterministic Policy Gradient (TD3)
Proximal Policy Optimization (PPO)
on three different OpenAI gym environments like Pendulum-v1, Hopper-v4 and HalfCheetah-v1 respectively.
Midsem
- Implementation of Random Maze Environment and its simulations
- Implementation of
Policy Iteration
andValue Iteration
for optimal policy calculation and values for each of the states in the environment and its comparative analyses. - Implementation of
Monte Carlo
,Temporal Difference-n step
,TD
($\lambda$ ) algorithm for calculation of values for each states using optimal policies and its comparative analyses.