QUICK REVIEW

[논문 리뷰] Explain Your Move: Understanding Agent Actions Using Specific and Relevant Feature Attribution

Nikaash Puri, Sukriti Verma|arXiv (Cornell University)|2019. 12. 23.

Explainable Artificial Intelligence (XAI)참고 문헌 27인용 수 32

한 줄 요약

SARFA는 perturbation 기반 saliency 메서드를 통해 행동 특이적이고 관련 특징을 강조하여 체스, 바둑, 아타리 전반에서 RL 에이전트의 행동에 대한 더 해석 가능한 설명을 제공합니다. 이는 선택된 행동에 대한 영향력(특이성)과 다른 행동에 대한 영향 최소화(관련성)를 조화 평균으로 결합합니다.

ABSTRACT

As deep reinforcement learning (RL) is applied to more tasks, there is a need to visualize and understand the behavior of learned agents. Saliency maps explain agent behavior by highlighting the features of the input state that are most relevant for the agent in taking an action. Existing perturbation-based approaches to compute saliency often highlight regions of the input that are not relevant to the action taken by the agent. Our proposed approach, SARFA (Specific and Relevant Feature Attribution), generates more focused saliency maps by balancing two aspects (specificity and relevance) that capture different desiderata of saliency. The first captures the impact of perturbation on the relative expected reward of the action to be explained. The second downweighs irrelevant features that alter the relative expected rewards of actions other than the action to be explained. We compare SARFA with existing approaches on agents trained to play board games (Chess and Go) and Atari games (Breakout, Pong and Space Invaders). We show through illustrative examples (Chess, Atari, Go), human studies (Chess), and automated evaluation methods (Chess) that SARFA generates saliency maps that are more interpretable for humans than existing approaches. For the code release and demo videos, see https://nikaashpuri.github.io/sarfa-saliency/.

연구 동기 및 목표

딥 강화 학습 에이전트가 보드 및 아케이드 게임에 적용될 때 해석 가능한 설명을 유도한다.
에이전트가 선택한 특정 행동과 관련된 특징에 초점을 맞춘 saliency 메서드를 개발한다.
이전 perturbation 기반 saliency 메서드가 비관련 특징이나 균일한 행동 효과를 강조하는 한계를 해결한다.

제안 방법

perturbations와 Q-values를 기반으로 각 상태 특징 f에 대해 S[f]를 정의한다.
Q-values에 대한 softmax를 사용하여 상대적 수익 P(s, ’a’)와 선택된 행동에 대한 perturbation impact Δp를 계산한다.
선택된 행동을 제외한 정규화된 상대 수익 사이의 KL-divergence를 이용하여 관련도 항을 계산한다.
Δp와 유사도 K = 1/(1 + D_KL)을 조화 평균으로 결합하여 S[f] = 2KΔp/(K+Δp)로 만든다.
Saliency가 선택된 행동에 구체적으로 영향을 미치는 특징을 강조하고 다른 행동에 영향을 주는 특징은 가중치를 낮추도록 보장한다.
블랙박스 Q(s, a) 접근 하에서 Chess (Stockfish), Go (MiniGo), 및 Atari (Breakout, Pong, Space Invaders)에서 SARFA를 평가한다.

실험 결과

연구 질문

RQ1SARFA가 기존 perturbation 기반 방법보다 더 행동 중심적이고 인간 해석에 친화적인 saliency 맵을 생성하는가?
RQ2특이성과 관련성이 체스, 바둑, 아타리 도메인에서 인간 이해를 어떻게 향상시키는가?
RQ3SARFA가 체스의 의미 있는 전술 모티프를 드러내고 인간의 퍼즐 풀이 성능을 향상시키는가?
RQ4SARFA가 perturbation에 강인하고 블랙박스 RL 에이전트에 적용 가능한가?

주요 결과

SARFA는 선택된 수나 행동과 실제로 관련된 기물 또는 영역을 더 집중적으로 강조하는 보다 집중된 saliency 맵을 산출한다.
사람이 참여하는 체스 퍼즐에서 SARFA saliency는 베이스라인 대비 정확도가 더 높고(solving accuracy) 풀이 시간이 더 짧았다(예: 72.41% 및 67.02초 중가).
SARFA의 체스 데이터셋 ROC 분석은 Greydanus 등 및 Iyer 등 접근 방식보다 인간과 관련된 기물을 더 잘 식별함을 보여준다.
SARFA 지침으로 체스 퍼즐을 해결한 사람들은 일부 설정에서 대략 25% 더 정확하고 31% 더 빠르게 퍼즐을 해결했다.
SARFA는 saliency 맵을 통해 핀(핀), 매트 2수, 과부하 작업과 같은 전술 모티프의 직관적인 설명을 보여준다.
perturbation을 통해 표적 행동을 변경하지 않는 교란에서도 SARFA saliency가 안정적으로 남아 있음을 나타내는 강건성 테스트에서 AUC가 약 0.92에 머물렀다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.