Skip to content

Latest commit

 

History

History
21 lines (19 loc) · 1.56 KB

Passive-ADP-Agent.md

File metadata and controls

21 lines (19 loc) · 1.56 KB

PASSIVE-ADP-AGENT

AIMA3e

function Passive-ADP-Agent(percept) returns and action
inputs: percept, a percept indication the current state s' and reward signal r'
persistent: π, a fixed policy
       mdp, an MDP with model P, rewards R, discount γ
       U, a table of utilities, initially empty
       Nsa, a table of frequencies for state-action pairs, initially zero
       Ns'|sa, a table of outcome frequencies given state-action pairs, initially zero
       s, a, the previous state and action, initially null
if s' is new then U[s'] ← r'; R[s'] ← r'
if s is not null then
   increment Nsa[s, a] and Ns'|sa[s', s, a]
   for each t such that Ns'|sa[t, s, a] is nonzero do
     P(t | s, a) ← Ns'|sa[t, s, a] / Nsa[s, a]
U ← Policy-Evaluation(π, U, mdp)
if s'.Terminal? then s, a ← null else s, as', π[s']


Figure ?? A passive reinforcement learning agent based on adaptive dynamic programming. The Policy-Evaluation function solves the fixed-policy Bellman equations, as described on page ??.