QUICK REVIEW

[논문 리뷰] Explaining Anomalies Detected by Autoencoders Using SHAP

Liat Antwarg, Miller, Ronnie Mindlin|arXiv (Cornell University)|2019. 03. 06.

Anomaly Detection Techniques and Applications참고 문헌 36인용 수 83

한 줄 요약

이 논문은 autoencoders로 감지된 이상을 설명하기 위한 모델-불가지론적 Kernel SHAP 방법을 제안하고, 재구성 오차가 가장 영향력 있는 특징과 연결되며, 실제 사용자 연구 및 합성 데이터로 검증합니다.

ABSTRACT

Anomaly detection algorithms are often thought to be limited because they don't facilitate the process of validating results performed by domain experts. In Contrast, deep learning algorithms for anomaly detection, such as autoencoders, point out the outliers, saving experts the time-consuming task of examining normal cases in order to find anomalies. Most outlier detection algorithms output a score for each instance in the database. The top-k most intense outliers are returned to the user for further inspection; however the manual validation of results becomes challenging without additional clues. An explanation of why an instance is anomalous enables the experts to focus their investigation on most important anomalies and may increase their trust in the algorithm. Recently, a game theory-based framework known as SHapley Additive exPlanations (SHAP) has been shown to be effective in explaining various supervised learning models. In this research, we extend SHAP to explain anomalies detected by an autoencoder, an unsupervised model. The proposed method extracts and visually depicts both the features that most contributed to the anomaly and those that offset it. A preliminary experimental study using real world data demonstrates the usefulness of the proposed method in assisting the domain experts to understand the anomaly and filtering out the uninteresting anomalies, aiming at minimizing the false positive rate of detected anomalies.

연구 동기 및 목표

autoencoder 기반 이상 탐지에서 전문가의 신뢰를 높이기 위해 인스턴스별 설명의 필요성을 제기한다.
내부 autoencoder 아키텍처를 알지 못해도 작동하는 블랙박스 설명 방법을 개발한다.
높은 재구성 오차를 이상 점수에 가장 크게 기여하는 특징들과 연결한다.
기여하는 특징과 상쇄하는 특징을 구분하는 시각적 및 표 형식의 설명을 제공한다.
사용자 연구, 합성 지상참값 실험, 강건성 테스트, 이상 점수 조작을 통해 설명을 평가한다.

제안 방법

재구성 오차 L(X,X')를 각 특징의 제곱 오차의 합으로 계산한다.
설명을 집중하기 위해 각 특징의 재구성 오차가 가장 높은 topMfeatures를 식별한다.
Kernel SHAP를 사용하여 각 top feature에 대해 X' i를 예측하는 것과 관련된 SHAP 값을 계산한다.
SHAP 값의 극성 및 X와 X' 간의 비교를 사용하여 기여하는(예측을 실제 값에서 멀어지게 하는) 특징과 상쇄하는(실제 값으로 향하게 하는) 특징으로 SHAP 값을 분할한다.
설명을 각 top feature에 대해 기여하는(빨간색) 및 상쇄하는(파란색) 특징을 보여주는 색상 표로 제시하고, SHAP 값의 크기가 중요도를 나타낸다.
추가 계층을 통해 총 재구성 오차를 설명하는 SHAP 기반의 대안 방법과의 비교를 선택적으로 수행하고 상위 특징의 일관성을 확인한다.

실험 결과

연구 질문

RQ1Kernel SHAP가 autoencoder가 탐지한 이상에 대해 신뢰할 수 있고 모델-불가지론적인 설명을 제공할 수 있는가?
RQ2어떤 특징들과 그 상호작용이 autoencoder 출력에서 높은 재구성 오차를 가장 잘 설명하는가?
RQ3이 맥락에서 SHAP 기반 설명이 LIME과 같은 다른 방법들보다 실제 기여 요인을 더 정확하게 반영하는가?
RQ4설명이 실제 데이터의 이상에 대한 도메인 전문가의 이해도와 검사 효율성을 향상시키는가?

주요 결과

제안된 SHAP 기반 설명은 autoencoder가 탐지한 이상에 대해 기여하는 특징과 상쇄하는 특징을 모두 드러낸다.
도메인 전문가들은 시각적 설명이 점검을 위해 가장 중요한 설명 특징에 집중하는 데 도움이 되었다고 보고했다.
합성 지상참값 테스트에서 SHAP를 사용한 설명은 이상을 담당하는 정확한 특징을 정확히 식별했다.
평가된 환경에서 SHAP 기반 설명이 LIME보다 더 강건했다.
실험에서 설명 특징을 조작하는 데 사용될 때 이상 점수를 감소시키는 데 설명이 효과적이었다.
실제 데이터 세트 전반에서 이 방법은 autoencoder 내부 구조 지식 없이도 더 나은 해석 가능성을 제공했다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.