Skip to content

Latest commit

 

History

History
34 lines (24 loc) · 1.23 KB

README.md

File metadata and controls

34 lines (24 loc) · 1.23 KB

demo-q-learning

some toy demos, q learning with neural network function approximator

files:

└── src
    └──envs 
    │   └── GridWorld.py          # a grid world
    ├── agent
    │   ├── Linear.py             # a linear network/regression 
    │   └── MLP.py                # a feed-forward network 
    ├── run_lqn_agent_minimal.py  # run a linear q network, update weights by hand (no autodiff)
    ├── run_lqn_agent.py          # run a linear q network     
    ├── run_mlp_agent.py          # run a feed-forward q network 
    ├── run_rnn_agent.py          # run a lstm q network 
    └── utils.py

results:

here's the q learning update rule, the agent is also epsilon greedy

lc

here's the learning curve from one agent:

lc


here's a sample path from a trained agent; red dot = reward, black dot = bomb:

path