[논문 리뷰] Gradient based Feature Attribution in Explainable AI: A Technical Review
포괄적인 신경망 그래디언트 기반 특징 기여도 연구조사로, 네 가지 그룹 분류 체계를 제안하고, 알고리즘 진전, 평가 접근법, 주요 과제를 상세히 다룬다.
The surge in black-box AI models has prompted the need to explain the internal mechanism and justify their reliability, especially in high-stakes applications, such as healthcare and autonomous driving. Due to the lack of a rigorous definition of explainable AI (XAI), a plethora of research related to explainability, interpretability, and transparency has been developed to explain and analyze the model from various perspectives. Consequently, with an exhaustive list of papers, it becomes challenging to have a comprehensive overview of XAI research from all aspects. Considering the popularity of neural networks in AI research, we narrow our focus to a specific area of XAI research: gradient based explanations, which can be directly adopted for neural network models. In this review, we systematically explore gradient based explanation methods to date and introduce a novel taxonomy to categorize them into four distinct classes. Then, we present the essence of technique details in chronological order and underscore the evolution of algorithms. Next, we introduce both human and quantitative evaluations to measure algorithm performance. More importantly, we demonstrate the general challenges in XAI and specific challenges in gradient based explanations. We hope that this survey can help researchers understand state-of-the-art progress and their corresponding disadvantages, which could spark their interest in addressing these issues in future work.
연구 동기 및 목표
- 신경망에 중심을 둔 그래디언트 기반 설명 가능성 방법을 조사한다.
- 그래디언트 기반 특징 기여도 접근법을 분류하는 토대를 제시한다.
- 이 방법들의 기법 세부사항과 연대별 진화를 요약한다.
- 평가 전략을 검토하고 일반적 및 그래디드 특유의 과제를 식별한다.
제안 방법
- Classify gradient-based attributions into four groups: vanilla gradients, integrated gradients, bias-gradient based explanations, and denoising post-processing.
- Describe algorithmic details and evolution of methods within each group.
- Explain baseline and path integral approaches used in gradient-based explanations.
- Discuss evaluation methodologies, including human and objective metrics, for attribution methods.
- Highlight axiomatic properties (e.g., sensitivity, completeness, implementation invariance) guiding method design.

실험 결과
연구 질문
- RQ1What are the main gradient-based approaches to feature attribution for neural networks?
- RQ2How can gradient-based methods be categorized, and what are the core techniques within each category?
- RQ3How are gradient-based explanations evaluated, and what are the standard metrics?
- RQ4What general and gradient-specific challenges limit gradient-based XAI, and where can future work improve?
- RQ5How do integrated gradients and its variants address issues like saturation and baseline choice?
주요 결과
- Introduces a novel four-group taxonomy for gradient-based feature attribution: vanilla gradients, integrated gradients, bias gradients, and post-processing denoising.
- Provides a detailed chronological overview of technique details and evolution within each group.
- Outlines widely used evaluation metrics including human and objective measures for comparing explanations.
- Identifies general XAI challenges and gradient-specific issues to guide future research.
- Synthesizes connections between gradient-based methods and axiomatic properties guiding attribution quality.

더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.