Skip to content

Compared Non-stationary Multi-armed Bandits in Single-Agent to Multi-Agents Scenarios- Distributed Optimization and Learning(DOL) Course Project

Notifications You must be signed in to change notification settings

erfunmirzaei/Multi-Agent-Bandit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

DOL-Project3

Compared Non-stationary Multi-armed Bandits in Single-Agent to Multi-Agents Scenarios- Distributed Optimization and Learning(DOL) Course Project

Overview

We implemented Bandit learning algorithms for single-agent and multi-agent scenarios in this project. To this end, we used a non-stationary environment where some reward functions change disruptively.

We considered different single-agent multi-armed bandits with 2 and 10 arms in the first part. In both cases, we designed environments with different difficulty levels, i.e., discriminability of rewards. We used Epsilon-greedy, Upper-Confidence Bound(UCB), Policy-gradient, Thompson Sampling, and Actor-Critic algorithms in this part.

In the second part, we considered multi-agent multi-armed bandit scenarios. Similar to the former, we considered different numbers of arms with different reward probability distributions. In this part, we used Joint Action Learners(JAL), Free Maximum Q-value(FMQ), Distributed Q-learning, and Multi-agent Actor-Critic Algorithm.

About

Compared Non-stationary Multi-armed Bandits in Single-Agent to Multi-Agents Scenarios- Distributed Optimization and Learning(DOL) Course Project

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published