partner-aware-ucb

This code simulates Partner-Aware UCB along with other multi-armed bandit (MAB) algorithms on decentralized cooperative MAB problems.

Companion code to AAAI 2022 paper:

Erdem Bıyık, Anusha Lalitha, Rajarshi Saha, Andrea Goldsmith, Dorsa Sadigh. "Partner-Aware Algorithms in Decentralized Cooperative Bandit Teams". Thirty-Sixth AAAI Conference on Artificial Intelligence, Feb. 2022.

Also presented at "Artificial Intelligence for Human-Robot Interaction (AI-HRI) at AAAI Fall Symposium Series" (Nov. 2021).

Dependencies

You need to have the following libraries with Python3:

Usage

You can run the following command to run the game once with the specified seed for randomness and the specified horizon.

  python main.py --seed 0 --horizon 2000

Setting up the Multi-Agent MAB environment

In the main.py, you can play with the agents' reward observabilities (line 12), number of actions (line 13). You can also experiment with more than 2 agents by having larger arrays.

By default, the reward matrix of the environment is random where each entry comes from a uniform distribution between 0 and 1. To experiment with a fixed reward matrix, you can enable line 21 of main.py to specify a matrix. Dimension i corresponds to the actions of agent i.

Setting up the Agents

agents.py includes a number of agent types we implemented. You can plug them in the main.py (line 24 and 25) to see how they perform together.

Currently, the following agents are available:

BernoulliGameHumanAgent: A human operator plays the game using the command-line
BernoulliGameOptimalAgent: Knows the true reward matrix and always takes the optimal action
BernoulliGameCentralizedThompsonSamplingAgent: A central controller that implements Thompson sampling (all agents have to be the same kind, because it is centralized)
BernoulliGameCentralizedUCBAgent: A central controller that implements UCB (all agents have to be the same kind, because it is centralized)
BernoulliGameNaiveThompsonSamplingAgent: Naive extension of Thompson sampling to the decentralized multi-agent case
BernoulliGameNaiveUCBAgent: Naive extension of UCB to the decentralized multi-agent case
BernoulliGamePartnerAwareUCBAgent: Our partner-aware UCB algorithm
BernoulliGamePartnerAwareThompsonAgent: Our partner-aware approach implemented with Thompson sampling instead of UCB

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
agents.py		agents.py
bernoulli_game.py		bernoulli_game.py
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

partner-aware-ucb

Dependencies

Usage

Setting up the Multi-Agent MAB environment

Setting up the Agents

About

Releases

Packages

Languages

License

Stanford-ILIAD/partner-aware-ucb

Folders and files

Latest commit

History

Repository files navigation

partner-aware-ucb

Dependencies

Usage

Setting up the Multi-Agent MAB environment

Setting up the Agents

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages