Q-LEARNING-AGENT

#AIMA3e function Q-Learning_Agent(percept) returns an action
inputs: percept, a percept indicating the current state s' and reward signal r'
persistent: Q, a table of action values indexed by state and action, initially zero
N_sa, a table of frequencies for state-action pairs, initially zero
s, a, r, the previous state, action, and reward, initially null

if Terminal?(s) then Q[s, None] ← r'
if s is not null then
increment N_sa[s, a]
Q[s, a] ← Q[s, a] + α(N_sa[s, a])(r + γ max_a' Q[s', a'] - Q[s, a])
s, a, r ← s', argmax_a' f(Q[s', a'], N_sa[s', a']), r'
return a

Figure ?? An exploratory Q-learning agent. It is an active learner that learns the value Q(s, a) of each action in each situation. It uses the same exploration function f as the exploratory ADP agent, but avoids having to learn the transition model because the Q-value of a state can be related directly to those of its neighbors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Q-Learning-Agent.md

Q-Learning-Agent.md

Q-LEARNING-AGENT

Files

Q-Learning-Agent.md

Latest commit

History

Q-Learning-Agent.md

File metadata and controls

Q-LEARNING-AGENT