QUICK REVIEW

[Paper Review] Inequity aversion improves cooperation in intertemporal social dilemmas

Edward Hughes, Joel Z. Leibo|arXiv (Cornell University)|Mar 23, 2018

Experimental Behavioral Economics Studies76 citations

TL;DR

The authors extend inequity-averse preferences to multi-agent reinforcement learning in Markov games and show that advantageous inequity aversion promotes cooperation in intertemporal social dilemmas, while disadvantageous inequity aversion helps via punishment in certain settings.

ABSTRACT

Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.

Motivation & Objective

Motivate the study of cooperation in temporally extended social dilemmas beyond static matrix games.
Generalize inequity-averse preferences to sequential Markov games in a multi-agent RL setting.
Examine how inequity aversion affects learning and policy formation to promote cooperation.
Explore how inequity aversion impacts temporal credit assignment and the emergence of cooperative behavior.

Proposed method

Model is a partially observable Markov game with multiple agents learning independently via their own observations and rewards.
Use asynchronous advantage actor-critic (A3C) with neural networks to learn policies for each agent.
Introduce per-player temporal smoothing of rewards to implement inequity aversion in sequential settings (intrinsic rewards).
Extend the Fehr–Schmidt inequity aversion model to Markov games with parameters for disadvantageous and advantageous inequity aversion.
Validate environments as social dilemmas using empirical Schelling diagrams and two gridworld games (Cleanup and Harvest).
Examine two additional games (Dictate apples, Give apples, Take apples) to illustrate inequity-averse behaviors in simple 2-player settings.

Experimental results

Research questions

RQ1Can inequity-averse preferences be extended from stateless matrix games to sequential, multi-agent Markov games?
RQ2Do advantageous and disadvantageous inequity aversion promote cooperation in intertemporal social dilemmas, and under what conditions?
RQ3How does inequity aversion influence temporal credit assignment and learning dynamics in multi-agent RL?
RQ4Are specific environments (public goods vs. commons) differentially affected by inequity-averse incentives?

Key findings

Advantageous inequity aversion improves collective outcomes and cooperation in the Cleanup public goods game, and also helps in Harvest, by improving temporal credit assignment.
Disadvantageous inequity aversion supports cooperation in the Harvest commons game through punishment and timing of incentives, even with a single agent exhibiting this trait.
Baseline A3C agents fail to achieve social benefits, whereas inequity-averse agents show improved social metrics such as cooperation and sustainability in certain settings.
Delaying intrinsic rewards for inequity aversion reduces its effectiveness, highlighting the role of timely intrinsic feedback in learning cooperative policies.
The effects are task-conditional: advantageous inequity aversion is especially effective for public goods dilemmas, while disadvantageous inequity aversion is stronger for commons dilemmas.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.