Graph Convolutional Reinforcement Learning

Based on DGN, I reproduced the results in the scenario, Routing, presented in the paper Graph Convolution Reinforcement Learning.

The algorithm for the assignment is implemented using TensorFlow in Python.

DGN

DGN is graph convolutional reinforcement learning, where the multi-agent environment is modeled as a graph, each agent is a node, and the encoding of local observation of the agent is the feature of the node. We apply convolution to the graph of agents. By employing multi-head attention as the convolution kernel, graph convolution is able to extract the relation representation between nodes and convolve the features from neighboring nodes just like a neuron in a convolutional neural network (CNN). Latent features extracted from gradually increased receptive fields are exploited to learn cooperative policies. Moreover, the relation representation is temporally regularized to help the agent develop a consistent cooperative policy.

In DGN, all agents share weights for the modules. The main reason is that agents use relation kernels to extract their relations based on the encodings of their observations. If the encoders are different (agents encode the observation in different ways), the relation kernels can hardly learn to extract their relations since the graph of agents is highly dynamic.

Requirement

Ubuntu 16.04
Anaconda 3
Python 3.6
Cuda 9.2
Cudnn 7.6.3

Setting RL Environment

First, we have to install MAgent which is a research platform for multi-agent reinforcement learning. MAgent aims at supporting reinforcement learning research that scales up from hundreds to millions of agents.

Installing MAgent on Linux:

git clone https://github.com/geek-ai/MAgent.git
cd MAgent
sudo apt-get install cmake libboost-system-dev libjsoncpp-dev libwebsocketpp-dev
bash build.sh
export PYTHONPATH=$(pwd)/python:$PYTHONPATH

Installing Dependencies

Before implementing DGN, we need to create a new conda environment (If you want) and install dependencies using pip.

Creating new conda environment

conda create -n <Your Environment Name> python=3.6
conda activate <Your Environment Name>

Cloning DGN codes and installing dependencies

git clone https://github.com/suno8386/DGN-reproduced.git
cd DGN-reproduced
pip install -r requirements.txt

The authors of DGN used TensorFlow 2.1.0 and Keras 2.3.1. But some functions are deprecated in them. So, I used TensorFlow 1.14.0 and Keras 2.0.8 and modified the codes accordingly.

Implementing DGN

Now, we will reproduce the result of the DGN in the Routing scenario. The result will be saved in the log file. (e.g. /home/DGN-reproduced/log_router_gqn.txt)

Reproducing DGN without regularization

python routers.py

Reproducing DGN with regularization

python routers_regularization.py

Code Outline

The main code is composed of 1) building the graph, 2) building the model, 3) playing, 4) training.

class Router: define an object ‘Router’ that is randomly connected to a constant number of other routers and become destination for the data packet (agent) to reach a goal.
class Edge: define an object ‘Edge’ where the Routers are randomly connected.
class Data: define an object ‘Data’ (agent). There are 𝑁 data packets with random size, and each packet is randomly assigned a source and destination router. If there are multiple packets with the sum size larger than the bandwidth of a link, they cannot go through the link simultaneously. They aim to quickly reach the destination while avoiding congestion.
def observation: define a method that makes observation of the data packet. At each time step, the observation of a packet is its own attributes (i.e., current location, destination, and data size), the attributes of cables connected to its current location (i.e., load, length), and neighboring data packets (on the connected cable or routers).
def set_action: define a method that sets the action of the data packet. It takes some time steps for the data packet to go through a cable, a linear function of the cable length. The action space of a packet is the choices of the next hop. Once the data packet arrives at the destination, it leaves the system and another data packet enters the system with random initialization.
def Adjacency: define a method that makes adjacency matrix of the graph.
def MLP: define a method that models a multi-layer perception.
def MultiHeadsAttModel: define a method that models a multi heads attention.
def Q_Net: define a method that models a Q-network.

Reference

@inproceedings{jiang2020graph,
    	title={Graph Convolutional Reinforcement Learning},
    	author={Jiang, Jiechuan and Dun, Chen and Huang, Tiejun and Lu, Zongqing},
    	booktitle={ICLR},
    	year={2020}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
DGN.png		DGN.png
README.md		README.md
ReplayBuffer_v2.py		ReplayBuffer_v2.py
dgn.h5		dgn.h5
log_router_gqn.txt		log_router_gqn.txt
log_router_reg_gqn.txt		log_router_reg_gqn.txt
requirements.txt		requirements.txt
routers.py		routers.py
routers_regularization.py		routers_regularization.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph Convolutional Reinforcement Learning

DGN

Requirement

Setting RL Environment

Installing Dependencies

Implementing DGN

Code Outline

Reference

About

Releases

Packages

Languages

sunoh-kim/graph-convolutional-reinforcement-learning

Folders and files

Latest commit

History

Repository files navigation

Graph Convolutional Reinforcement Learning

DGN

Requirement

Setting RL Environment

Installing Dependencies

Implementing DGN

Code Outline

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages