The GroverQLearning is a reinforcement learning with a quantum agent that implements the Grover's algorithm. In this project, we adopt the widely used reinforcement Q-learning algorithm which has the following architecture:
The key step in this Q-learning is to update a so-called Q-function which takes a state-action pair as its argument, namely
What an agent needs to do is that, given a Q-function, it should observe the environment and choose an action to perform then update the Q-function based on the reward it gets. There are many ways to implement this agent. A simple classical agent would be a search algorithm directly searching for the action that maximizes the Q-function for a given state. In this project, we instead use the quantum search algorithm called Grover's algorithm to do the job. The basic idea is to first encode the actions for a given state
The Grover length
We adopt the algorithm from Ref. [1] which reads
but with some modifications. The update of the V-values requires a search for maximal values in Q-function. To avoid that, we directly update
Our Groverlearner code was partly inspired by Ref. [2].
More information can be found in the project report or project report if the first link doesn't work.
In terminal, type:
pip install qrllearner
pip install numpy
pip install matplotlib
pip install qiskit
Source code: QQL_learner_trainer.py
Install the gym package by typing the following command in terminal.
pip install gym
We will use the FrozenLake environment, which is a global environment, i.e., the agent can see the whole enviroment (contrast to a local environment where the agent can only see its neighboring environment, which we will discuss in the next section).
Run the following code in a python notebook:
import numpy as np
import matplotlib.pyplot as plt
import gym
from qrllearner import GroverQlearner
# setup FrozenLake environment
env = gym.make("FrozenLake-v1", is_slippery=False, render_mode="ansi")
# create GroverQlearner object as the agent
QuanAgent = GroverQlearner(env,env_type='global')
# set hyperparameters for training
hyperp = {'k': 0.1,
'alpha': 0.1,
'gamma': 0.99,
'eps': 0.01,
'max_epochs': 3000,
'max_steps': 15}
QuanAgent.set_hyperparams(hyperp)
# train model
steps_vs_epochs,target_reached_vs_epochs,_ = QuanAgent.train()
# plot step vs epoch
plt.plot(steps_vs_epochs)
plt.xlabel('Epoch')
plt.ylabel('Steps')
plt.show()
# plot target reached vs epoch
plt.scatter(range(len(target_reached_vs_epochs)),target_reached_vs_epochs)
plt.xlabel('Epoch')
plt.ylabel('Target Reached')
plt.show()
Install the custom sidewalk environment:
pip install sidewalkdemo
Source code: SideWalkEnv.py
This is a local environment where the agent can only see its neighboring four sites and the environment will change after taking actions. For more information, please see the project report (or project report if the link doesn't work).
Run the following code in a python notebook
import matplotlib.pyplot as plt
import numpy as np
from qrllearner import GroverQlearner
from sidewalkdemo import *
# set up and visualize the road map
env = side_walk_env_with_obstacle(50,15,12,2,0.2)
# env.plot_roadmap()
QuanAgent = GroverQlearner(env,env_type='local')
hyperp = {'k': 0.1,
'alpha': 0.1,
'gamma': 0.8,
'eps': 0.01,
'max_epochs': 800,
'max_steps': 300}
# set hyperparms
QuanAgent.set_hyperparams(hyperp)
# TRAIN
QuanAgent.train()
# plot the trajectory after training
env_with_obstacle_test = side_walk_env_with_obstacle(p_obstacle=0.15)
trajectory = env_with_obstacle_test.trajectory(QuanAgent.Q_values)
env_with_obstacle_test.plot_roadmap_with_trajectory('avoiding obstacles_Quantum agent',trajectory)
The above example shows a special task for avoiding obstacles when going through the sidewalk. We also provide another task for picking up litters along the sidewalk. Examples can be found in file QLearning_sidewalk_picking_up_litters_quantum_agent.ipynb
In package qrllearner, we also provide a classical learner class called ClassicalLearner
with the same api. Run above examples by simply replacing GroverQlearner
with ClassicalLearner
. However, the hyperparameters for the classical learner is different from those for the Grover learner. Fine tunning is needed to get better performence.
[1]: Ganger, M. and Hu, W. International Journal of Intelligence Science, 9, 1-22 (2019)
[2]: QRL repository