Multi-Agent Reinforcement Learning papers

This is a collection of Multi-Agent Reinforcement Learning (MARL) papers. Each category is a potential start point for you to start your research. Some papers are listed more than once because they belong to multiple categories.

For MARL papers with code and MARL resources, please refer to MARL Papers with Code and MARL Resources Collection.

I will continually update this repository and I welcome suggestions. (missing important papers, missing categories, invalid links, etc.) This is only a first draft so far and I'll add more resources in the next few months.

This repository is not for commercial purposes.

My email: chenhao915@mails.ucas.ac.cn

Overview

Reviews

Recent Reviews (Since 2019)

Other Reviews (Before 2019)

If multi-agent learning is the answer, what is the question?
Multiagent learning is not the answer. It is the question
Is multiagent deep reinforcement learning the answer or the question? A brief survey Note that A Survey and Critique of Multiagent Deep Reinforcement Learning is an updated version of this paper with the same authors.
Evolutionary Dynamics of Multi-Agent Learning: A Survey
(Worth reading although they're not recent reviews.)

Environments

Environment	Paper	Code	Accepted at	Year
StarCraft	The StarCraft Multi-Agent Challenge	https://github.com/oxwhirl/smac	NIPS	2019
StarCraft	SMACv2: A New Benchmark for Cooperative Multi-Agent Reinforcement Learning	https://github.com/oxwhirl/smacv2		2022
StarCraft	Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks	https://github.com/uoe-agents/epymarl	NIPS	2021
Football	Google Research Football: A Novel Reinforcement Learning Environment	https://github.com/google-research/football	AAAI	2020
PettingZoo	PettingZoo: Gym for Multi-Agent Reinforcement Learning	https://github.com/Farama-Foundation/PettingZoo	NIPS	2021
Melting Pot	Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot	https://github.com/deepmind/meltingpot	ICML	2021
MuJoCo	MuJoCo: A physics engine for model-based control	https://github.com/deepmind/mujoco	IROS	2012
MALib	MALib: A Parallel Framework for Population-based Multi-agent Reinforcement Learning	https://github.com/sjtu-marl/malib		2021
MAgent	MAgent: A many-agent reinforcement learning platform for artificial collective intelligence	https://github.com/Farama-Foundation/MAgent	AAAI	2018
Neural MMO	Neural MMO: A Massively Multiagent Game Environment for Training and Evaluating Intelligent Agents	https://github.com/openai/neural-mmo		2019
MPE	Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments	https://github.com/openai/multiagent-particle-envs	NIPS	2017
Pommerman	Pommerman: A multi-agent playground	https://github.com/MultiAgentLearning/playground		2018
HFO	Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork	https://github.com/LARG/HFO	AAMAS Workshop	2016

Dealing With Credit Assignment Issue

Value Decomposition

Paper	Code	Accepted at	Year
VDN：Value-Decomposition Networks For Cooperative Multi-Agent Learning	https://github.com/oxwhirl/pymarl	AAMAS	2017
QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning	https://github.com/oxwhirl/pymarl	ICML	2018
QTRAN: Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement Learning	https://github.com/oxwhirl/pymarl	ICML	2019
NDQ: Learning Nearly Decomposable Value Functions Via Communication Minimization	https://github.com/TonghanWang/NDQ	ICLR	2020
CollaQ：Multi-Agent Collaboration via Reward Attribution Decomposition	https://github.com/facebookresearch/CollaQ		2020
SQDDPG：Shapley Q-Value: A Local Reward Approach to Solve Global Reward Games	https://github.com/hsvgbkhgbv/SQDDPG	AAAI	2020
QPD：Q-value Path Decomposition for Deep Multiagent Reinforcement Learning		ICML	2020
Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning	https://github.com/oxwhirl/wqmix	NIPS	2020
QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning			2020
QPLEX: Duplex Dueling Multi-Agent Q-Learning	https://github.com/wjh720/QPLEX	ICLR	2021

Other Methods

Paper	Code	Accepted at	Year
COMA：Counterfactual Multi-Agent Policy Gradients	https://github.com/oxwhirl/pymarl	AAAI	2018
LiCA：Learning Implicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning	https://github.com/mzho7212/LICA	NIPS	2020

Policy Gradient

Paper	Code	Accepted at	Year
MADDPG：Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments	https://github.com/openai/maddpg	NIPS	2017
COMA：Counterfactual Multi-Agent Policy Gradients	https://github.com/oxwhirl/pymarl	AAAI	2018
IPPO：Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?			2020
MAPPO：The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games	https://github.com/marlbenchmark/on-policy		2021
MAAC：Actor-Attention-Critic for Multi-Agent Reinforcement Learning	https://github.com/shariqiqbal2810/MAAC	ICML	2019
DOP: Off-Policy Multi-Agent Decomposed PolicyGradients	https://github.com/TonghanWang/DOP	ICLR	2021
M3DDPG：Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient		AAAI	2019

Communication

Communication Without Bandwidth Constraint

Paper	Code	Accepted at	Year
CommNet：Learning Multiagent Communication with Backpropagation	https://github.com/facebookarchive/CommNet	NIPS	2016
BiCNet：Multiagent Bidirectionally-Coordinated Nets: Emergence of Human-level Coordination in Learning to Play StarCraft Combat Games	https://github.com/Coac/CommNet-BiCnet		2017
VAIN: Attentional Multi-agent Predictive Modeling		NIPS	2017
IC3Net：Learning when to Communicate at Scale in Multiagent Cooperative and Competitive Tasks	https://github.com/IC3Net/IC3Net		2018
VBC：Efficient Communication in Multi-Agent Reinforcement Learning via Variance Based Control		NIPS	2019
Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation			2018
NDQ：Learning Nearly Decomposable Value Functions Via Communication Minimization NDQ: Learning Nearly Decomposable Value Functions Via Communication Minimization	https://github.com/TonghanWang/NDQ	ICLR	2020
RIAL/RIDL：Learning to Communicate with Deep Multi-Agent Reinforcement Learning	https://github.com/iassael/learning-to-communicate	NIPS	2016
ATOC：Learning Attentional Communication for Multi-Agent Cooperation		NIPS	2018
Fully decentralized multi-agent reinforcement learning with networked agents	https://github.com/cts198859/deeprl_network	ICML	2018
TarMAC: Targeted Multi-Agent Communication		ICML	2019

Communication Under Limited Bandwidth

Paper	Accepted at	Year
SchedNet：Learning to Schedule Communication in Multi-Agent Reinforcement learning		2019
Learning Multi-agent Communication under Limited-bandwidth Restriction for Internet Packet Routing		2019
Gated-ACML：Learning Agent Communication under Limited Bandwidth by Message Pruning	AAAI	2020
Learning Efficient Multi-agent Communication: An Information Bottleneck Approach	ICML	2020
Coordinating Multi-Agent Reinforcement Learning with Limited Communication	AAMAS	2013

Emergent

Paper	Code	Accepted at	Year
Multiagent Cooperation and Competition with Deep Reinforcement Learning		PloS one	2017
Multi-agent Reinforcement Learning in Sequential Social Dilemmas			2017
Emergent preeminence of selfishness: an anomalous Parrondo perspective		Nonlinear Dynamics	2019
Emergent Coordination Through Competition			2019
Biases for Emergent Communication in Multi-agent Reinforcement Learning		NIPS	2019
Towards Graph Representation Learning in Emergent Communication			2020
Emergent Tool Use From Multi-Agent Autocurricula	https://github.com/openai/multi-agent-emergence-environments	ICLR	2020
On Emergent Communication in Competitive Multi-Agent Teams		AAMAS	2020
QED：Quasi-Equivalence Discovery for Zero-Shot Emergent Communication			2021
Incorporating Pragmatic Reasoning Communication into Emergent Language		NIPS	2020

Opponent Modeling

Paper	Code	Accepted at	Year
Bayesian Opponent Exploitation in Imperfect-Information Games		IEEE Conference on Computational Intelligence and Games	2018
LOLA：Learning with Opponent-Learning Awareness		AAMAS	2018
Variational Autoencoders for Opponent Modeling in Multi-Agent Systems			2020
Stable Opponent Shaping in Differentiable Games			2018
Opponent Modeling in Deep Reinforcement Learning	https://github.com/hhexiy/opponent	ICML	2016
Game Theory-Based Opponent Modeling in Large Imperfect-Information Games		AAMAS	2011
Agent Modelling under Partial Observability for Deep Reinforcement Learning		NIPS	2021

Game Theoretic

Paper	Accepted at	Year
α-Rank: Multi-Agent Evaluation by Evolution	Scientific reports	2019
α^α -Rank: Practically Scaling α-Rank through Stochastic Optimisation	AAMAS	2020
A Game Theoretic Framework for Model Based Reinforcement Learning	ICML	2020
Fictitious Self-Play in Extensive-Form Games	ICML	2015
Combining Deep Reinforcement Learning and Search for Imperfect-Information Games	NIPS	2020
Real World Games Look Like Spinning Tops	NIPS	2020
PSRO: A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning	NIPS	2017
Pipeline PSRO: A Scalable Approach for Finding Approximate Nash Equilibria in Large Games	NIPS	2020
A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems	AAMAS	2013
Neural Replicator Dynamics: Multiagent Learning via Hedging Policy Gradients	AAMAS	2020

Hierarchical

Paper	Accepted at	Year
Hierarchical multi-agent reinforcement learning	AAMAS	2006
Hierarchical Cooperative Multi-Agent Reinforcement Learning with Skill Discovery	AAMAS	2020
Hierarchical Critics Assignment for Multi-agent Reinforcement Learning		2019
Hierarchical Reinforcement Learning for Multi-agent MOBA Game		2019
Hierarchical Deep Multiagent Reinforcement Learning with Temporal Abstraction		2018
HAMA：Multi-Agent Actor-Critic with Hierarchical Graph Attention Network	AAAI	2020

Ad Hoc Teamwork

Paper	Code	Accepted at	Year
CollaQ：Multi-Agent Collaboration via Reward Attribution Decomposition	https://github.com/facebookresearch/CollaQ		2020
A Game-Theoretic Model and Best-Response Learning Method for Ad Hoc Coordination in Multiagent Systems		AAMAS	2013
Half Field Offense: An Environment for Multiagent Learning and Ad Hoc Teamwork	https://github.com/LARG/HFO	AAMAS Workshop	2016
Open Ad Hoc Teamwork using Graph-based Policy Learning	https://github.com/uoe-agents/GPL	ICLM	2021
A Survey of Ad Hoc Teamwork: Definitions, Methods, and Open Problems			2022
Towards open ad hoc teamwork using graph-based policy learning		ICML	2021
Learning with generated teammates to achieve type-free ad-hoc teamwork		IJCAI	2021
Online ad hoc teamwork under partial observability		ICLR	2022

League Training

Paper	Code	Accepted at	Year
AlphaStar：Grandmaster level in StarCraft II using multi-agent reinforcement learning		Nature	2019

Curriculum Learning

Paper	Code	Accepted at	Year
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems		AAMAS	2021
From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning	https://github.com/starry-sky6688/MARL-Algorithms	AAAI	2020
EPC：Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning	https://github.com/qian18long/epciclr2020	ICLR	2020
Emergent Tool Use From Multi-Agent Autocurricula	https://github.com/openai/multi-agent-emergence-environments	ICLR	2020
Learning to Teach in Cooperative Multiagent Reinforcement Learning		AAAI	2019
StarCraft Micromanagement with Reinforcement Learning and Curriculum Transfer Learning		IEEE Transactions on Emerging Topics in Computational Intelligence	2018
Cooperative Multi-agent Control using deep reinforcement learning	https://github.com/sisl/MADRL	AAMAS	2017
Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems		NIPS	2021

Mean Field

Paper	Accepted at	Year
Mean Field Multi-Agent Reinforcement Learning	ICML	2018
Efficient Ridesharing Order Dispatching with Mean Field Multi-Agent Reinforcement Learning	The world wide web conference	2019
Bayesian Multi-type Mean Field Multi-agent Imitation Learning	NIPS	2020

Transfer Learning

Paper	Code	Accepted at	Year
A Survey on Transfer Learning for Multiagent Reinforcement Learning Systems		Journal of Artificial Intelligence Research	2019
Parallel Knowledge Transfer in Multi-Agent Reinforcement Learning			2020

Meta Learning

Paper	Code	Accepted at	Year
A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning		ICML	2021
Continuous Adaptation via Meta-Learning in Nonstationary and Competitive Environments			2017

Fairness

Paper	Accepted at	Year
FEN：Learning Fairness in Multi-Agent Systems	NIPS	2019
Fairness in Multiagent Resource Allocation with Dynamic and Partial Observations	AAMAS	2018
Fairness in Multi-agent Reinforcement Learning for Stock Trading		2019

Exploration

Dense Reward Exploration

Paper	Code	Accepted at	Year
MAVEN：Multi-Agent Variational Exploration	https://github.com/starry-sky6688/MARL-Algorithms	NIPS	2019
Social Influence as Intrinsic Motivation for Multi-Agent Deep Reinforcement Learning		ICML	2019
Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration		NIPS	2021
Celebrating Diversity in Shared Multi-Agent Reinforcement Learning	https://github.com/lich14/CDS	NIPS	2021

Sparse Reward Exploration

Paper	Code	Accepted at	Year
EITI/EDTI：Influence-Based Multi-Agent Exploration	https://github.com/TonghanWang/EITI-EDTI	ICLR	2020
Cooperative Exploration for Multi-Agent Deep Reinforcement Learning		ICML	2021
Centralized Model and Exploration Policy for Multi-Agent			2021
REMAX: Relational Representation for Multi-Agent Exploration		AAMAS	2022

Uncategorized

Paper	Code	Accepted at	Year
CM3: Cooperative Multi-goal Multi-stage Multi-agent Reinforcement Learning		ICLR	2020
Coordinated Exploration via Intrinsic Rewards for Multi-Agent Reinforcement Learning			2019
Exploration by Maximizing Renyi Entropy for Reward-Free RL Framework		AAAI	2021
Exploration-Exploitation in Multi-Agent Learning: Catastrophe Theory Meets Game Theory		AAAI	2021
LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning	https://github.com/yalidu/liir	NIPS	2019

Graph Neural Network

Paper	Code	Accepted at	Year
Multi-Agent Game Abstraction via Graph Attention Neural Network	https://github.com/starry-sky6688/MARL-Algorithms	AAAI	2020
Graph Convolutional Reinforcement Learning for Multi-Agent Cooperation		ICLR	2020
Multi-Agent Reinforcement Learning with Graph Clustering			2020
Learning to Coordinate with Coordination Graphs in Repeated Single-Stage Multi-Agent Decision Problems		ICML	2018

Model-based

Paper	Code	Accepted at	Year
Model-based Multi-Agent Reinforcement Learning with Cooperative Prioritized Sweeping			2020

NAS

Paper	Code	Accepted at	Year
MANAS: Multi-Agent Neural Architecture Search			2019

Safe Multi-Agent Reinforcement Learning

Paper	Code	Accepted at	Year
MAMPS: Safe Multi-Agent Reinforcement Learning via Model Predictive Shielding			2019
Safer Deep RL with Shallow MCTS: A Case Study in Pommerman			2019

From Single-Agent to Multi-Agent

Paper	Code	Accepted at	Year
IQL：Multi-Agent Reinforcement Learning: Independent vs. Cooperative Agents	https://github.com/oxwhirl/pymarl	ICML	1993
IPPO：Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?			2020
MAPPO：The Surprising Effectiveness of MAPPO in Cooperative, Multi-Agent Games	https://github.com/marlbenchmark/on-policy		2021
MADDPG：Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments	https://github.com/openai/maddpg	NIPS	2017

Discrete-Continuous Hybrid Action Space / Parameterized Action Space

Paper	Accepted at	Year
Deep Reinforcement Learning in Parameterized Action Space		2015
DMAPQN: Deep Multi-Agent Reinforcement Learning with Discrete-Continuous Hybrid Action Spaces	IJCAI	2019
H-PPO: Hybrid actor-critic reinforcement learning in parameterized action space	IJCAI	2019
P-DQN: Parametrized Deep Q-Networks Learning: Reinforcement Learning with Discrete-Continuous Hybrid Action Space		2018

Role

Paper	Code	Accepted at	Year
ROMA: Multi-Agent Reinforcement Learning with Emergent Roles	https://github.com/TonghanWang/ROMA	ICML	2020
RODE: Learning Roles to Decompose Multi-Agent Tasks	https://github.com/TonghanWang/RODE	ICLR	2021
Scaling Multi-Agent Reinforcement Learning with Selective Parameter Sharing	https://github.com/uoe-agents/seps	ICML	2021

Diversity

Paper	Code	Accepted at	Year
Diverse Auto-Curriculum is Critical for Successful Real-World Multiagent Learning Systems		AAMAS	2021
Q-DPP：Multi-Agent Determinantal Q-Learning	https://github.com/QDPP-GitHub/QDPP	ICML	2020
Diversity is All You Need: Learning Skills without a Reward Function			2018
Modelling Behavioural Diversity for Learning in Open-Ended Games		ICML	2021
Diverse Agents for Ad-Hoc Cooperation in Hanabi		CoG	2019
Generating Behavior-Diverse Game AIs with Evolutionary Multi-Objective Deep Reinforcement Learning		IJCAI	2020
Quantifying environment and population diversity in multi-agent reinforcement learning			2021

Sparse Reward

Paper	Code	Accepted at	Year
Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems		NIPS	2021
Individual Reward Assisted Multi-Agent Reinforcement Learning	https://github.com/MDrW/ICML2022-IRAT	ICML	2022

Large Scale

Paper	Code	Accepted at	Year
From Few to More: Large-Scale Dynamic Multiagent Curriculum Learning	https://github.com/starry-sky6688/MARL-Algorithms	AAAI	2020
PooL: Pheromone-inspired Communication Framework for Large Scale Multi-Agent Reinforcement Learning			2022
Factorized Q-learning for large-scale multi-agent systems		ICDAI	2019
EPC：Evolutionary Population Curriculum for Scaling Multi-Agent Reinforcement Learning	https://github.com/qian18long/epciclr2020	ICLR	2020
Mean Field Multi-Agent Reinforcement Learning		ICML	2018
A Study of AI Population Dynamics with Million-agent Reinforcement Learning		AAMAS	2018

DTDE

Paper	Accepted at	Year
Networked Multi-Agent Reinforcement Learning in Continuous Spaces	IEEE conference on decision and control	2018
Value Propagation for Decentralized Networked Deep Multi-agent Reinforcement Learning	NIPS	2019
Fully Decentralized Multi-Agent Reinforcement Learning with Networked Agents	ICML	2018

Decision Transformer

Paper	Code	Year
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks		2021
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem	https://github.com/PKU-MARL/Multi-Agent-Transformer	2022
Towards Flexible Inference in Sequential Decision Problems via Bidirectional Transformers		2022

Offline MARL

Paper	Code	Accepted at	Year
Offline Pre-trained Multi-Agent Decision Transformer: One Big Sequence Model Conquers All StarCraftII Tasks			2021
Believe what you see: Implicit constraint approach for offline multi-agent reinforcement learning		NIPS	2021

Adversarial

Paper	Accepted at	Year
Certifiably Robust Policy Learning against Adversarial Communication in Multi-agent Systems		2022
Distributed Multi-Agent Deep Reinforcement Learning for Robust Coordination against Noise		2022
On the Robustness of Cooperative Multi-Agent Reinforcement Learning	IEEE Security and Privacy Workshops	2020
Towards Comprehensive Testing on the Robustness of Cooperative Multi-agent Reinforcement Learning	CVPR workshop	2022
Robust Multi-Agent Reinforcement Learning via Minimax Deep Deterministic Policy Gradient	AAAI	2019
Multi-agent Deep Reinforcement Learning with Extremely Noisy Observations	NIPS Deep Reinforcement Learning Workshop	2018
Policy Regularization via Noisy Advantage Values for Cooperative Multi-agent Actor-Critic methods		2021

Multi-Agent Path Finding

TODO

To be Categorized

Paper	Code	Accepted at	Year
Mind-aware Multi-agent Management Reinforcement Learning	https://github.com/facebookresearch/M3RL	ICLR	2019
Emergence of grounded compositional language in multi-agent populations	https://github.com/bkgoksel/emergent-language	AAAI	2018
Emergent Complexity via Multi-Agent Competition	https://github.com/openai/multiagent-competition	ICLR	2018
TLeague: A Framework for Competitive Self-Play based Distributed Multi-Agent Reinforcement Learning	https://github.com/tencent-ailab/TLeague		2020
UPDeT: Universal Multi-agent Reinforcement Learning via Policy Decoupling with Transformers	https://github.com/hhhusiyi-monash/UPDeT	ICLR	2021
SIDE: State Inference for Partially Observable Cooperative Multi-Agent Reinforcement Learning	https://github.com/deligentfool/SIDE	AAMAS	2022
UNMAS: Multiagent Reinforcement Learningfor Unshaped Cooperative Scenarios	https://github.com/James0618/unmas	TNNLS	2021
Context-Aware Sparse Deep Coordination Graphs	https://github.com/TonghanWang/CASEC-MACO-benchmark	ICLR	2022

TODO

Multi-Agent Path Finding
Generalization in MARL

Citation

If you find this repository useful, please cite our repo:

@misc{chen2021multi,
  author={Chen, Hao},
  title={Multi-Agent Reinforcement Learning Papers},
  year={2021}
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/TimeBreaker/Multi-Agent-Reinforcement-Learning-papers}}
}

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
README.md		README.md

TimeBreaker/Multi-Agent-Reinforcement-Learning-papers

Folders and files

Latest commit

History

Repository files navigation