QUICK REVIEW

[논문 리뷰] A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity

Pablo Hernández-Leal, Michael Kaisers|arXiv (Cornell University)|2017. 07. 28.

Advanced Bandit Algorithms Research참고 문헌 156인용 수 198

한 줄 요약

이 설문은 다중 에이전트 환경에서 학습이 비정상성에 어떻게 대처하는지 검토하고, 접근 방식을 범주화하기 위한 다섯 가지 범주 프레임워크를 제시한다.

ABSTRACT

The key challenge in multiagent learning is learning a best response to the behaviour of other agents, which may be non-stationary: if the other agents adapt their strategy as well, the learning target moves. Disparate streams of research have approached non-stationarity from several angles, which make a variety of implicit assumptions that make it hard to keep an overview of the state of the art and to validate the innovation and significance of new works. This survey presents a coherent overview of work that addresses opponent-induced non-stationarity with tools from game theory, reinforcement learning and multi-armed bandits. Further, we reflect on the principle approaches how algorithms model and cope with this non-stationarity, arriving at a new framework and five categories (in increasing order of sophistication): ignore, forget, respond to target models, learn models, and theory of mind. A wide range of state-of-the-art algorithms is classified into a taxonomy, using these categories and key characteristics of the environment (e.g., observability) and adaptation behaviour of the opponents (e.g., smooth, abrupt). To clarify even further we present illustrative variations of one domain, contrasting the strengths and limitations of each category. Finally, we discuss in which environments the different approaches yield most merit, and point to promising avenues of future research.

연구 동기 및 목표

적대자로부터 유발된 비정상성이 밴드잇, 강화학습, 게임이론 전반에서 어떻게 다루어지는지 종합한다.
다중 에이전트 학습에서 비정상성 처리의 일관된 프레임워크를 제시한다.
환경 및 상대 적응 요인에 따라 최첨단 알고리즘을 범주화한다.
비정상적 다중 에이전트 학습의 강점, 한계 및 향후 연구 방향을 논의한다.

제안 방법

다중팔 밴드잇, 강화학습, 게임 이론의 형식 모델을 검토하여 비정상성을 프레이밍한다.
비정상성 처리를 위한 다섯 가지 범주로 구성된 새로운 프레임워크를 제안한다: 무시하기, 망각하기, 대상 모델에 대응하기, 모델 학습하기, 마음의 이론.
도메인 예시로 각 범주를 설명하여 강점과 한계를 강조한다.
범주와 환경/적응 특성별로 알고리즘의 분류 체계를 제공한다.
향후 연구 주제와 개방된 문제를 논의한다.

실험 결과

연구 질문

RQ1다양한 도메인(밴드잇, RL, 게임 이론)에서 다중 에이전트 학습의 비정상성은 어떻게 발생하는가?
RQ2비정상성 처리의 진전 수준을 가장 잘 포착하는 프레임워크는 무엇인가?
RQ3다양한 관찰 가능성 및 상대 적응 가정 하에서 어떤 알고리즘이 어떤 범주에 부합하는가?
RQ4비정상적 다중 에이전트 학습에서의 주요 개방 문제와 향후 연구 방향은 무엇인가?

주요 결과

비정상성에 대처하는 다섯 가지 범주 프레임워크를 제안한다: 무시하기, 망각하기, 대상 상대에 대응하기, 상대의 모델 학습하기, 마음의 이론.
환경/상대 적응에 따라 범주별 및 최첨단 알고리즘을 분류하는 분류학을 제공한다.
각 범주의 강점과 한계를 대비하기 위한 설명적 변형을 제시한다.
다양한 접근 방식이 가장 큰 이점을 얻는 환경을 분석하고 향후 연구 방향을 제시한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.