Skip to content

๐Ÿ“ข 2019 Microsoft Student Partners (MSP) Evangelism Seminar - 2019.03.31

License

Notifications You must be signed in to change notification settings

gunh0/reinforcement-learning-cartpole-balancing

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

8 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

2019 Microsoft Student Partners (MSP) Evangelism Seminar

์ฒ˜์Œ ์‹œ์ž‘ํ•˜๋Š” ๊ฐ•ํ™”ํ•™์Šต with OpenAI Gym

2019. 03. 31

msp-logo.png


Cart Pole ๊ท ํ˜• ๋ฌธ์ œ๋Š” ์œ ์ „์ž ์•Œ๊ณ ๋ฆฌ์ฆ˜, ์ธ๊ณต์‹ ๊ฒฝ๋ง, ๊ฐ•ํ™”ํ•™์Šต ๋“ฑ์„ ์ด์šฉํ•œ ์ œ์–ด ์ „๋žต ๋ถ„์•ผ์˜ ํ‘œ์ค€ ๋ฌธ์ œ์ด๋‹ค.

cartpole-task.gif

Result (legacy)

result-old.png


Last Updated (2024. 01.)

https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

  • python 3.11.9

This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent on the CartPole-v1 task from Gymnasium.

output.png

Diagram

diagram.png

Actions are chosen either randomly or based on a policy, getting the next step sample from the gym environment. We record the results in the replay memory and also run optimization step on every iteration. Optimization picks a random batch from the replay memory to do training of the new policy. The โ€œolderโ€ target_net is also used in optimization to compute the expected Q values. A soft update of its weights are performed at every step.