Skip to content

sharavsambuu/learning-drl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 

Repository files navigation

# About
  This repo is just my learning journey and may contain a buggy naive implementations.

# Tasks

  DONE - Implement C51 aka Categorical DQN with Jax
  - Implement QR-DQN which is improvements over C51
  - Implement IQN which is improvements over previous C51 and QR-DQN
  - Implement FQF which is improvements overs C51, QR-DQN and IQN
  - Implement N-step DQN with Jax
  - Implement Rainbow
  ON GOING - Implement continuous Soft Actor Critics with Jax
  ON GOING - Implement discrete Soft Actor Critics with Jax
  - Implement Hierarchical DQN
  - Implement DDPG aka Deep Deterministic Policy Gradient with Jax
  - Implement TD3 aka Twin Delayed Deep Deterministic Policy Gradient with Jax
  - Implement PPO aka Proximal Policy Optimization
  - Implement TRPO aka Trust Region Policy Optimization
  - Implement SimCLRv2 with Jax
  - Implement CURL with and compare results
  DONE - Implement A2C plus entropy bonus with Jax
  DONE - Implement SQL with Jax, aka Soft Q-Learning
  DONE - Implement A3C with Multiprocessing and Jax
  DONE - Implement A3C with Jax
  DONE - Implement online Advantage Actor Critics A2C with Jax
  DONE - Implement episodic Advantage Actor Critics A2C with Jax
  DONE - Implement Policy Gradient with Jax
  DONE - Implement vanilla DQN with Jax
  DONE - Implement vanilla DQN with Jax + PER
  DONE - Implement Double DQN with Jax
  DONE - Implement Double DQN with Jax + PER
  DONE - Implement Dueling DQN with Jax
  DONE - Implement Dueling DQN with Jax + PER
  DONE - Implement Dueling Double DQN with PER in jax
  DONE - Implement eGreedy Noisy Dueling Double DQN + PER


# Other things I will learn in the near future

  - What is SimCLRv2?
  - What is CURL?
  - What is MARL aka Multi-Agent RL?
  - What is Concurrent Experience Replay Trajectories?
  - What is Dec-HDRQN, Decentralized Hysteretic DQN?
  - What is PPO-RNN?
  - What is DQN-RNN?
  - What is General Advantage Estimation (GAE) Buffer?


# Dependencies
  sudo apt install libsdl2-dev swig python3-tk
  sudo apt install python-numpy cmake zlib1g-dev libjpeg-dev libboost-all-dev gcc libsdl2-dev wget unzip

# Prepare
  virtualenv -p python3 env && source env/bin/activate && pip install -r requirements.txt

# Additional Rocket Lander Gym extension
  git clone https://github.com/Jeetu95/Rocket_Lander_Gym.git

  change CONTINUOUS variable in Rocket_Lander_Gym/rocket_lander_gym/envs/rocket_lander.py to False

  cd Rocket_Lander_Gym && pip install .

# Google's Jax and Flax
  https://github.com/google/jax
  https://github.com/google/flax

  Variables can be vary, change those variables according to your machine specs

`
	PYTHON_VERSION=cp38  # alternatives: cp36, cp37, cp38
	CUDA_VERSION=cuda101  # alternatives: cuda100, cuda101, cuda102, cuda110
	PLATFORM=manylinux2010_x86_64  # alternatives: manylinux2010_x86_64
	BASE_URL='https://storage.googleapis.com/jax-releases'
	pip install --upgrade $BASE_URL/$CUDA_VERSION/jaxlib-0.1.51-$PYTHON_VERSION-none-$PLATFORM.whl
	pip install --upgrade jax  # install jax
	pip install --upgrade flax
`

# When on-deman GPU resource utilization needed
`
    export XLA_PYTHON_CLIENT_ALLOCATOR=platform
`

# References
  https://github.com/joaogui1/RL-JAX/tree/master/DQN

About

I'm learning a deep reinforcement learning with jax and flax which is numpy on steroids.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published