QUICK REVIEW

[논문 리뷰] Soft + Hardwired Attention: An LSTM Framework for Human Trajectory Prediction and Abnormal Event Detection

Tharindu Fernando, Simon Denman|arXiv (Cornell University)|2017. 02. 18.

Video Surveillance and Tracking Methods참고 문헌 11인용 수 19

한 줄 요약

이 논문은 인간의 궤적을 예측하고 감시 영상에서 비정상 사건을 탐지하기 위해 소프트 어텐션과 하드웨어드 어텐션을 결합한 새로운 LSTM 기반 프레임워크를 제안한다. 학습 가능한 소프트 어텐션과 수작업으로 설계된 공간 어텐션 가중치를 통합함으로써, 모델은 혼잡하고 복잡한 환경에서 궤적 예측 정확도를 향상시키고 수작업 특징 없이 엔드 투 엔드 비정상성 탐지가 가능해지며, 두 개의 공개 데이터셋에서 최신 기술을 초월한다.

ABSTRACT

As humans we possess an intuitive ability for navigation which we master through years of practice; however existing approaches to model this trait for diverse tasks including monitoring pedestrian flow and detecting abnormal events have been limited by using a variety of hand-crafted features. Recent research in the area of deep-learning has demonstrated the power of learning features directly from the data; and related research in recurrent neural networks has shown exemplary results in sequence-to-sequence problems such as neural machine translation and neural image caption generation. Motivated by these approaches, we propose a novel method to predict the future motion of a pedestrian given a short history of their, and their neighbours, past behaviour. The novelty of the proposed method is the combined attention model which utilises both "soft attention" as well as "hard-wired" attention in order to map the trajectory information from the local neighbourhood to the future positions of the pedestrian of interest. We illustrate how a simple approximation of attention weights (i.e hard-wired) can be merged together with soft attention weights in order to make our model applicable for challenging real world scenarios with hundreds of neighbours. The navigational capability of the proposed method is tested on two challenging publicly available surveillance databases where our model outperforms the current-state-of-the-art methods. Additionally, we illustrate how the proposed architecture can be directly applied for the task of abnormal event detection without handcrafting the features.

연구 동기 및 목표

높은 관측자 밀도를 가지는 복잡하고 혼잡한 환경에서 보행자의 궤적을 정확하게 예측하는 데 도전한다.
이웃 보행자의 영향을 모델링하기 위해 학습 가능한 소프트 어텐션과 공간적으로 구조화된 하드웨어드 어텐션을 결합하여 궤적 예측 성능을 향상시킨다.
수작업 특징이 필요 없는 LSTM 히든 상태를 활용하여 엔드 투 엔드 비정상 이벤트 탐지를 가능하게 한다.
다양한 군중 역학을 포함한 실제 감시 데이터셋에서 모델의 강건성과 일반화 능력을 입증한다.

제안 방법

시간에 따라 순차적인 보행자 궤적을 모델링하기 위해 인코더-디코더 LSTM 아키텍처를 사용한다.
관심 대상 보행자의 궤적을 학습된 어텐션 함수를 사용해 소프트 어텐션을 적용한다.
이웃 보행자들의 영향을 공간적 거리와 상대 위치 기반으로 하드웨어드 어텐션 가중치를 도입하여 모델링한다.
소프트 어텐션과 하드웨어드 어텐션의 컨텍스트 벡터를 통합하여 미래 궤적 예측을 위한 유일한 표현으로 통합한다.
DBSCAN을 사용한 클러스터링 기반 비정상 이벤트 탐지를 위해 LSTM 인코더 및 디코더의 히든 상태를 활용한다.
관측된 궤적을 기반으로 엔드 투 엔드로 모델을 훈련하여 향후 경로를 예측하고 정상 행동에서의 이탈을 탐지한다.

실험 결과

연구 질문

RQ1소프트 어텐션과 하드웨어드 어텐션을 융합한 하이브리드 어텐션 메커니즘이 혼잡하고 복잡한 군중 상황에서 궤적 예측 성능을 향상시킬 수 있는가?
RQ2제안된 모델은 높은 보행자 밀도와 동적 상호작용을 포함한 실제 감시 데이터에 대해 얼마나 잘 일반화되는가?
RQ3수작업 특징 없이 LSTM 히든 상태를 얼마나 효과적으로 비정상 행동 탐지에 활용할 수 있는가?
RQ4궤적 예측 정확도와 비정상성 탐지 성능 측면에서 기존 최신 기술 대비 모델은 어떻게 비교되는가?

주요 결과

모델은 두 개의 공개 감시 데이터셋에서 최신 기술을 초월하는 최고 성능을 달성했으며, 궤적 예측 정확도에서 기존 방법들을 능가했다.
하이브리드 어텐션 메커니즘이 수백 명의 이웃이 존재하는 상황에서 성능 향상을 크게 이끌었으며, 실제 혼잡한 환경에 대한 확장성 가능성을 입증했다.
모델은 55개의 참값 비정상 이벤트 중 47개(85.5% 재현율)를 탐지했으며, 간단한 베이스라인은 오직 29개(52.7% 재현율)만 탐지했다.
거짓 경고는 주로 갑작스럽게 티켓을 구매하기 위해 방향을 돌리는 등의 드문 비정상 행동이 원인이 되었으며, 이는 저빈도 패턴에 민감함을 시사한다.
예측 경로와 관측 경로가 가까워도 갑작스러운 방향 전환, 원형 운동, 비정상적인 속도를 포함한 비정상 이벤트를 성공적으로 탐지했다.
LSTM 히든 상태의 클러스터링을 통해 특징 없이도 비정상 이벤트 탐지를 가능하게 하여 강력한 일반화 능력을 입증했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.