Skip to content

wonseokjung/ReinforcementLearning_byWonseok

Repository files navigation

고전강화학습에서부터 DQN까지의 강화학습 알고리즘의 이론 및 구현

작성자 : 정원석

언어는 python을 사용하였다.

deep learning 프레임워크로는 Tensorflow 또는 Kears를 사용하였다.


목차

1. 강화학습이란?

이론: https://wonseokjung.github.io//reinforcementlearning/update/RL-RL1/

1.Introduction 2.강화학습 3.강화학습의 예 4.강화학습의 요소

https://wonseokjung.github.io//reinforcementlearning/update/RL-RL2/

Multi-armed Bandits1.A k-armed Bandit Problem 2.Action-value Method 3.The 10-armed Testbed 4.Incremental Implementation 5.Tracking a Nonstationary Problem 6.Optimistic Initial values 7.Upper-Confidence-Bound Action Selection 8.Gradient Bandit Algorithms

실습 :

  • openAI tutorial

https://wonseokjung.github.io//reinforcementlearning/update/openai-gym/

2. Markov Decision Process

이론:

실습:

3. Dynamic programming

이론:

  • Dynamic programming Policy Evaluation

  • Dynamic programming Policy Iteration

  • Dynamic programming Value Iteration

실습:

  • policy iteration - grid world
  • value iteration - grid world

4. MonteCarlo

이론:

실습:

  • Get familiar with the Blackjack environment (Blackjack-v0)

  • Monte Carlo Prediction to estimate state-action values

  • on-policy first-visit Monte Carlo Control algorithm

  • off-policy every-visit Monte Carlo Control using Weighted Important Sampling algorithm

5. Temporal-Difference Learning

이론:

실습:

  • Get familiar with the Windy Gridworld Playground

  • Implement SARSA

  • Get familiar with the Cliff Environment Playground

  • Implement Q-Learning in Python

6. Function Approximation

이론:

  • On-policy Prediction with Approximation

  • On-policy Control with Approximation

실습:

  • Get familiar with the Mountain Car Playground

  • Q-Learning with Value Function Approximation

7. Deep-Q-Learning

이론:

  • DQN

  • DDQN

  • Prioritized Experience Replay

실습:

  • Get familiar with the OpenAI Gym Atari Environment Playground

  • Deep-Q Learning for Atari Games

  • Double-Q Learning

  • Prioritized Experience Replay

  • SuperMario-DQN

  • Using Keras and Deep Q-Network to Play FlappyBird

https://yanpanlau.github.io/2016/07/10/FlappyBird-Keras.html

8. Policy Gradient Methods

이론:

실습:

  • REINFORCE with Baseline

  • Actor-Critic with Baseline

  • Actor-Critic with Baseline for Continuous Action Spaces

  • Deterministic Policy Gradients for Continuous Action Spaces (WIP)

  • Deep Deterministic Policy Gradients (WIP)

  • Asynchronous Advantage Actor-Critic (A3C)

Reference

  • Reinforcement Learning: An Introduction Richard S. Sutton and Andrew G. Barto Second Edition, in progress MIT Press, Cambridge, MA, 2017

  • Dennnybrtiz https://github.com/dennybritz

About

강화학습의 거의 모든 것

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published