graph LR;
QCog --> MessageClient;
MessageClient --> PCogMessageHandler;
PCogMessageHandler --> ModelLearner;
ModelLearner --> PerceptionProcessor;
PerceptionProcessor --> ModelLearner;
ModelLearner --> PCogMessageHandler;
PCogMessageHandler --> MessageClient;
MessageClient --> QCog;
QCog --> TestBed;
TestBed --> QCog;
PCog is a POMDP model learning framework that interlops with the QCog environment.
The basic idea of model learning is that the agent will spend some time "exploring" it's environment. In the case of PCog "exploration" is some policy
Once the agent has perceived enough of it's environment it will derive a POMDP -
In other words
-
$r \in [0, 1]$ is a random number sampled from the uniform distribution$U$ -
$B : H \rightarrow (S \rightarrow \mathbb{R})$ is a function that produces a belief state from some history$H$ of observations
The diagram above explains the components that PCog makes use of.
- QCog - the java code base
- MessageClient - PCog client in the code base
- TestBed - the Unity test application
- PCogMessageHandler - python module that interprets messages from the PCogClient
- ModelLearner - a state machine that dictates exploration and exploitation strategy of PCog. It also maintains the Utile Suffix Memory model
- PerceptionProcessor - this module takes raw perceptions coming from the MessageClient and discritises them into a form that is compatible with the reinforcement learning module. Also contains methods for scoring perceptions - IE a reward function. It also contains an exploration strategy.
graph TD
Root[root] -->0
Root --> 2
0 --> a1{a}
0 --> b1{b}
b1 --> 01[0]
b1 --> 11[1]
2 -.-> a21{a}
2 -.-> b21{b}
a21 -.-> 02[0]
a21 -.-> 12[1]
b21 -.-> 03[0]
b21 -.-> 13[1]
a1 -.-> 00[0]
01 -.-> a30{a}
11 -.-> a31{a}
11 -.-> b30{b}
This is quite an involved topic - I don't really think it's necessary to write about it. Check out this paper which describes the utile suffix memory method in depth. TLDR it's basically a suffix trie which records previous sequences of actions and uses the leaves of the suffix trie to alias to states.