Name		Name	Last commit message	Last commit date
parent directory ..
experiments/taxi		experiments/taxi
README.md		README.md
agent.go		agent.go
table.go		table.go
table_test.go		table_test.go

README.md

Q-learning

An implementation fo the Q-learning algorithm with adaptive learning.

How it works

In Q-learning the agent stores Q-values (quality values) for each state that it encounters. Q-values are determined by the following equation.

Q-learning is an off-policy form of temporal difference. The Agent simply learns by storing a quality value for the state that it encountered and the reward that it recieved for the action taken along with the discounted future reward. Taking the future reward into account at each value iteration forms a Markov Chain which will converge to the highest reward.

An agent will explore or exploit the Q-values based on the epsilon hyperparameter.

The implemented agent also employs adaptive learning by which the alpha and epsilon hyperparameters are dynamically tuned based on the timestep and an ada divisor parameter.

Q-learning doesn't work well in continous environments, the pkg/v1/env package provides a normalization adapter. One of the adapters is for discretization and can be used to make continuous states discrete.

Examples

See the experiments folder for example implementations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

q

q

README.md

Q-learning

How it works

Examples

References

Files

q

Directory actions

More options

Directory actions

More options

Latest commit

History

q

Folders and files

parent directory

README.md

Q-learning

How it works

Examples

References