Skip to content

Optimizing nonlinear functions in multi-objective reinforcement learning for fairness

Notifications You must be signed in to change notification settings

MuhangTian/Fair-MORL-AAMAS

Repository files navigation

Welfare and Fairness in Multi-objective Reinforcement Learning

DOI arxiv badge

This repository contains our implementation of experiments in the paper "Welfare and Fairness in Multi-objective Reinforcement Learning," where we explored optimizing nonlinear functions over accumulated rewards of multiple dimensions. Our work was published at The International Conference on Autonomous Agents and Multiagent Systems (AAMAS) in 2023.

Requirements 🛠️

To install the necessary packages, run

pip install -r requirements.txt

Code Structure 📚

The code has two major parts:

  • Simulated enviroments: Fair_Taxi_MDP_Penalty_V2.py and resource_gathering.py
  • Algorithm implementation: linear_scalarization.py, mixture_policy.py, nsw_ql.py, other_welfare.py, and ql.py

Experiments

  • To reproduce our results for Welfare Q-learning with NSW, mixture policy, and linear scalarization baselines, run those files with same hyper-parameter as reported in the paper, respectively: nsw_ql.py, mixture_policy.py, linear_scalarization.py
  • To reproduce our results in the supplementary material for other welfare functions (egalitarian, p-welfare), select the welfare function of your choice in the file, and run other_welfare.py with same hyper-parameters.

See My Agent

I personally find it very satisfying to see the performance of trained agents. Thus, I have provided evaluate.py in this repository, which can be used to render the performance of trained agents using PyGame. To use evaluate.py, load the trained Q-table saved from training, and pass it into eval_nsw() function. An example of the rendering is shown below.

Visualization Example

Our agent (the moving circle) is a taxi driver who is trying to deliver passengers from multiple different locations (red, blue squares represent destinations and origins, respectively). When the circle turns green, it means there is a passenger currently in taxi, with the number indicating where the passenger originates. When the circle turns orange, it means the taxi is currently empty and is available to pick a new passenger. The goal of the agent is to perform delivery services fairly over the locations, which is defined by maximizing expected Nash Social Welfare in this specific case.

Citation

@inproceedings{fan2023welfare,
  title={Welfare and Fairness in Multi-objective Reinforcement Learning},
  author={Fan, Ziming and Peng, Nianli and Tian, Muhang and Fain, Brandon},
  booktitle={Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems},
  pages={1991--1999},
  year={2023}
}

About

Optimizing nonlinear functions in multi-objective reinforcement learning for fairness

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages