QUICK REVIEW

[논문 리뷰] Human vs Machine Attention in Neural Networks: A Comparative Study.

Qiuxia Lai, Wenguan Wang|arXiv (Cornell University)|2019. 06. 20.

Visual Attention and Saliency Detection인용 수 6

한 줄 요약

이 연구는 세 가지 컴퓨터 비전 작업에서 인간 시각적 주의와 딥 네ural 네트워크의 인공 주의를 체계적으로 비교하며, 실제 인간의 시선 데이터와 다양한 아키텍처를 사용한다. 인공 주의를 인간 주의와 일치시키면 모델 성능과 해석 가능성 향상이 가능하며, 특히 고수준 비전 작업에서 두드러진다.

ABSTRACT

Human visual system can selectively attend to parts of a scene for quick perception, a biological mechanism known as Human attention. Inspired by this, recent deep learning models encode attention mechanisms to focus on the most task-relevant parts of the input signal for further processing, which is called Machine/Neural/Artificial attention. Understanding the relation between human and machine attention is important for interpreting and designing neural networks. Many works claim that the attention mechanism offers an extra dimension of interpretability by explaining where the neural networks look. However, recent studies demonstrate that artificial attention maps do not always coincide with common intuition. In view of these conflicting evidence, here we make a systematic study on using artificial attention and human attention in neural network design. With three example computer vision tasks, diverse representative backbones, and famous architectures, corresponding real human gaze data, and systematically conducted large-scale quantitative studies, we quantify the consistency between artificial attention and human visual attention and offer novel insights into existing artificial attention mechanisms by giving preliminary answers to several key questions related to human and artificial attention mechanisms. Overall results demonstrate that human attention can benchmark the meaningful `ground-truth' in attention-driven tasks, where the more the artificial attention is close to human attention, the better the performance; for higher-level vision tasks, it is case-by-case. It would be advisable for attention-driven tasks to explicitly force a better alignment between artificial and human attention to boost the performance; such alignment would also improve the network explainability for higher-level computer vision tasks.

연구 동기 및 목표

딥 네럴 네트워크에서 인간 시각적 주의와 인공 주의 메커니즘 간 일관성 여부를 조사하는 것.
인공 주의 맵이 인간 직관이 의도한 바와 같이 작업에 관련된 특징을 진짜로 반영하고 있는지 평가하는 것.
인공 주의를 인간의 시선과 일치시키면 모델 성능과 해석 가능성 향상이 이루어지는지 확인하는 것.
주의 기반 컴퓨터 비전 작업에서 의미 있는 주의에 대한 경험적 기준을 제공하는 것.
신경망 내에서 더 해석 가능하고 효과적인 주의 메커니즘 설계에 대한 통찰을 제공하는 것.

제안 방법

연구는 인간 시각 인식 작업 중에 수집한 실제 인간의 시선 데이터를 인간 주의의 대체 지표로 사용한다.
이 연구는 이미지 분류, 객체 검출, 이미지 캡션 생성의 세 가지 컴퓨터 비전 작업에서 다양한 백본을 가진 최신 신경망 아키텍처를 평가한다.
사전 훈련된 모델에서 인공 주의 맵을 추출하고, 상관관계 및 교차 면적 비율과 같은 정량적 정합도 지표를 사용해 인간의 시선 데이터와 비교한다.
다양한 주의 정합도와 최종 작업 성능 간의 관계를 평가하기 위해 대규모이고 체계적인 실험을 수행한다.
통계 분석을 통해 인간 주의와의 정합도 향상이 모델 정확도와 강건성 향상과 관련이 있는지 확인한다.
정확도 외에도 인간 평가와 선명도 맵과의 일관성 등을 통해 주의 맵의 해석 가능성 여부도 평가한다.

실험 결과

연구 질문

RQ1다양한 컴퓨터 비전 작업에서 인공 주의와 인간 시각적 주의 간 일관성은 어느 정도인가?
RQ2인공 주의를 인간의 시선과 일치시키면 모델 성능 향상에 어느 정도 기여하는가?
RQ3인공 주의와 인간 주의 간 정합도 수준이 모델의 해석 가능성과 관련이 있는가?
RQ4주의 정합도의 효과성에 작업별 차이가 존재하는가?
RQ5인간의 시선 데이터는 인공 주의 메커니즘 평가를 위한 신뢰할 수 있는 기준이 될 수 있는가?

주요 결과

인공 주의 맵은 종종 인간의 시선 패턴에서 벗어나 있어, 주의 메커니즘이 본질적으로 해석 가능하다는 가정을 도전한다.
저수준 비전 작업에서는 인공 주의와 인간 주의 간 정합도가 높을수록 모델 성능 향상과 강하게 상관된다.
고수준 비전 작업에서는 주의 정합도의 성능 기여도는 사례별로 달라지며, 항상 적용 가능한 것은 아니다.
훈련 과정에서 인공 주의와 인간 주의를 강제로 일치시키면 모델 정확도와 해석 가능성 양측 모두 향상된다.
인간 주의는 주의 기반 작업에서 의미 있는 주의에 대한 타당한 기준이 된다.
연구는 주의 메커니즘이 인간과 유사한 집중 행동을 반영할수록 더 효과적이라는 경험적 증거를 제공한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.