์ฒ์ ์์ํ๋ ๊ฐํํ์ต with OpenAI Gym
2019. 03. 31
Cart Pole ๊ท ํ ๋ฌธ์ ๋ ์ ์ ์ ์๊ณ ๋ฆฌ์ฆ, ์ธ๊ณต์ ๊ฒฝ๋ง, ๊ฐํํ์ต ๋ฑ์ ์ด์ฉํ ์ ์ด ์ ๋ต ๋ถ์ผ์ ํ์ค ๋ฌธ์ ์ด๋ค.
https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html
- python 3.11.9
This tutorial shows how to use PyTorch to train a Deep Q Learning (DQN) agent on the CartPole-v1 task from Gymnasium.
Diagram
Actions are chosen either randomly or based on a policy, getting the next step sample from the gym environment. We record the results in the replay memory and also run optimization step on every iteration. Optimization picks a random batch from the replay memory to do training of the new policy. The โolderโ target_net is also used in optimization to compute the expected Q values. A soft update of its weights are performed at every step.