DOL-Project3

Compared Non-stationary Multi-armed Bandits in Single-Agent to Multi-Agents Scenarios- Distributed Optimization and Learning(DOL) Course Project

Overview

We implemented Bandit learning algorithms for single-agent and multi-agent scenarios in this project. To this end, we used a non-stationary environment where some reward functions change disruptively.

We considered different single-agent multi-armed bandits with 2 and 10 arms in the first part. In both cases, we designed environments with different difficulty levels, i.e., discriminability of rewards. We used Epsilon-greedy, Upper-Confidence Bound(UCB), Policy-gradient, Thompson Sampling, and Actor-Critic algorithms in this part.

In the second part, we considered multi-agent multi-armed bandit scenarios. Similar to the former, we considered different numbers of arms with different reward probability distributions. In this part, we used Joint Action Learners(JAL), Free Maximum Q-value(FMQ), Distributed Q-learning, and Multi-agent Actor-Critic Algorithm.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Project3_DOL.ipynb		Project3_DOL.ipynb
README.md		README.md
Report_Project3_DOL_2.pdf		Report_Project3_DOL_2.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DOL-Project3

Overview

About

Releases

Packages

Languages

erfunmirzaei/Multi-Agent-Bandit

Folders and files

Latest commit

History

Repository files navigation

DOL-Project3

Overview

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages