QUICK REVIEW

[논문 리뷰] MinAtar: An Atari-Inspired Testbed for Thorough and Reproducible Reinforcement Learning Experiments

Kenny Young, Tian Tian|arXiv (Cornell University)|2019. 03. 07.

Reinforcement Learning in Robotics참고 문헌 16인용 수 42

한 줄 요약

MinAtar는 10x10 격자와 의미 있게 해석 가능한 채널을 갖춘 다섯 개의 단순화된 Atari-영감을 받은 환경을 제공하여 재현 가능한, 행동 중심의 RL 실험을 표현 복잡성을 줄인 형태로 가능하게 한다.

ABSTRACT

The Arcade Learning Environment (ALE) is a popular platform for evaluating reinforcement learning agents. Much of the appeal comes from the fact that Atari games demonstrate aspects of competency we expect from an intelligent agent and are not biased toward any particular solution approach. The challenge of the ALE includes (1) the representation learning problem of extracting pertinent information from raw pixels, and (2) the behavioural learning problem of leveraging complex, delayed associations between actions and rewards. Often, the research questions we are interested in pertain more to the latter, but the representation learning problem adds significant computational expense. We introduce MinAtar, short for miniature Atari, a new set of environments that capture the general mechanics of specific Atari games while simplifying the representational complexity to focus more on the behavioural challenges. MinAtar consists of analogues of five Atari games: Seaquest, Breakout, Asterix, Freeway and Space Invaders. Each MinAtar environment provides the agent with a 10x10xn binary state representation. Each game plays out on a 10x10 grid with n channels corresponding to game-specific objects, such as ball, paddle and brick in the game Breakout. To investigate the behavioural challenges posed by MinAtar, we evaluated a smaller version of the DQN architecture as well as online actor-critic with eligibility traces. With the representation learning problem simplified, we can perform experiments with significantly less computational expense. In our experiments, we use the saved compute time to perform step-size parameter sweeps and more runs than is typical for the ALE. Experiments like this improve reproducibility, and allow us to draw more confident conclusions. We hope that MinAtar can allow researchers to thoroughly investigate behavioural challenges similar to those inherent in the ALE.

연구 동기 및 목표

오토가 Atari 게임의 핵심 행동 도전을 포착하는 더 작고 재현 가능한 테스트베드를 제공합니다.
표현 학습의 복잡성을 줄이는 한편 핵심 게임 메커니즘을 보존합니다.
더 빠른 훈련과 더 많은 시드로 광범위하고 통계적으로 강건한 실험을 가능하게 합니다.
MinAtar 내에서 행동 중심 작업에 대해 다양한 RL 방법의 성능을 보여줍니다.

제안 방법

Seaquest, Breakout, Asterix, Freeway, Space Invaders로 매핑된 다섯 개의 MinAtar 환경을 n 개의 의미 채널이 있는 10x10 격자에 제공합니다.
4개의 기본 방향 이동, 화염, 아무动作(none) 의 6가지 행동으로 축소된 행동 공간을 사용합니다.
픽셀 기반 표현 학습을 우회하기 위한 단순화된 보상과 의미 있는 입력 채널을 제공합니다.
끈적거림(sticky-actions)과 무작위 스폰 위치를 통해 변동성을 도입하여 확률적 특성을 포함합니다.
경험 재생이 포함된 DQN 변형과 적합성 추적을 갖춘 온라인 배우-가치자(AC(λ))를 평가합니다.
작은 네트워크(DQN: 16x3x3 conv, 128-unit FC)와 5백만 프레임의 학습을 통해 CPU 훈련 및 매개변수 스윕을 가능하게 합니다.

실험 결과

연구 질문

RQ1다른 RL 알고리즘(DQN을 경험 재생 여부를 불문하고, AC(λ))이 표현 학습이 아닌 행동에 초점을 맞춘 단순화된 Atari-영감 작업에서 어떻게 수행되는가?
RQ2학습 안정성과 MinAtar 환경에서의 성능에 대한 스텝 크기 하이퍼파라미터와 적합성 추적의 영향은 무엇인가?
RQ3MinAtar 환경이 Atari 게임과 유사한 질적 행동 차이와 커리큘럼과 같은 역학을 보여 주면서도 더 광범위한 실험을 가능하게 하는가?
RQ4MinAtar가 탐색, 크레딧 배정, 정책 안정성과 같은 주제를 계산 자원을 절감하며 연구하는 효율적인 대리 도구가 될 수 있는가?

주요 결과

DQN은 초기 학습에서 AC(λ)보다 빨리 개선되나, 여러 환경에서 장기적으로 AC(λ)이 DQN을 능가할 수 있다.
경험 재생은 모든 게임에서 DQN에 명확한 이점을 제공한다; 재생 없이 DQN은 성능이 좋지 않다.
온라인 AC(λ)와 RMSProp 및 활성화 함수(SiLU/dSiLU)는 일부 작업에서 안정성과 경쟁력 있는 성능을 제공한다.
MinAtar는 에이전트-환경 페어당 30개의 무작위 시드를 허용하여 더 타이트한 신뢰 구간과 철저한 하이퍼파라미터 스윕을 가능하게 한다.
관찰되는 질적 행동으로 Breakout의 경로 제거 전략과 Seaquest의 공기 상승 경향이 있으며, 이는 완전한 Atari 복잡성 없이 의미 있는 행동 역동성을 보여준다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.