QUICK REVIEW

[논문 리뷰] shapiq: Shapley Interactions for Machine Learning

Maximilian Muschalik, Hubert Baniecki|arXiv (Cornell University)|2024. 10. 02.

Software Engineering Research인용 수 5

한 줄 요약

shapiq를 소개합니다. 이는 머신러닝을 위한 Shapley 값(SVs)과 어떤 순서의 Shapley 상호작용(SIs)을 계산하는 방법을 통합하고 벤치마크하는 오픈소스 Python 라이브러리로, 애플리케이션에 의존하지 않는 인터페이스와 다수 도메인에 걸친 벤치마킹 스위트를 제공합니다.

ABSTRACT

Originally rooted in game theory, the Shapley Value (SV) has recently become an important tool in machine learning research. Perhaps most notably, it is used for feature attribution and data valuation in explainable artificial intelligence. Shapley Interactions (SIs) naturally extend the SV and address its limitations by assigning joint contributions to groups of entities, which enhance understanding of black box machine learning models. Due to the exponential complexity of computing SVs and SIs, various methods have been proposed that exploit structural assumptions or yield probabilistic estimates given limited resources. In this work, we introduce shapiq, an open-source Python package that unifies state-of-the-art algorithms to efficiently compute SVs and any-order SIs in an application-agnostic framework. Moreover, it includes a benchmarking suite containing 11 machine learning applications of SIs with pre-computed games and ground-truth values to systematically assess computational performance across domains. For practitioners, shapiq is able to explain and visualize any-order feature interactions in predictions of models, including vision transformers, language models, as well as XGBoost and LightGBM with TreeSHAP-IQ. With shapiq, we extend shap beyond feature attributions and consolidate the application of SVs and SIs in machine learning that facilitates future research. The source code and documentation are available at https://github.com/mmschlk/shapiq.

연구 동기 및 목표

ML에서 SVs와 any-order SIs를 계산하기 위한 애플리케이션에 구애받지 않는 프레임워크를 제공한다.
최신 SI 근사 알고리즘을 하나의 인터페이스로 통합한다.
모델 예측에서 상호작용을 설명하고 시각화하기 위한 설명 API를 제공한다.
연구자들을 안내하기 위해 도메인 전반에 걸친 사전 계산된 ground-truth SI 값을 포함한 벤치마킹 스위트를 제공한다.
표준 특징 기여도 이상의 특징 상호작용에 대한 연구 및 시각화를 촉진한다.

제안 방법

여러 상호작용 인덱스와 차수에 걸친 SI 알고리즘에 대한 근사 인터페이스를 구현한다.
18개의 상호작용 인덱스와 MI 표현에 대한 정확한 계산을 위한 ExactComputer를 포함한다.
경계 및 페어링 기법과 같은 샘플링 기법을 이용한 근사 개선을 위한 CoalitionSampler 인터페이스를 제공한다.
평가를 위한 사전 계산된 SI ground-truth를 포함한 실세계 ML 도메인을 아우르는 11개의 벤치마크 게임 모음을 제공한다.
예측에 대한 any-order 특징 상호작용을 생성하고 시각화하기 위한 Explainer API를 노출한다.
트리 기반 모델에 대한 효율적인 설명과 일반적인 ML 라이브러리 지원을 위해 TreeSHAP-IQ와 통합한다.

실험 결과

연구 질문

RQ1ML 모델에서 any-order 상호작용에 대해 Shapley Interactions를 어떻게 효율적으로 계산할 수 있는가?
RQ2단일 API가 다양한 모델 유형과 데이터 도메인에 걸쳐 여러 SI 인덱스와 근사 방법을 지원할 수 있는가?
RQ3실용적 자원 제약 하에서 SI 근사기가 ground-truth SI 값에 어느 정도 근접하는가?
RQ4연구자들이 실제 데이터 세트와 모델에서 higher-order 상호작용을 벤치마크하고 시각화할 수 있는 방법은?

주요 결과

shapiq는 최신 SI 알고리즘과 작은 게임에 대한 정확한 계산을 통합하는 오픈소스 라이브러리를 제공합니다.
18개의 상호작용 인덱스와 게임 이론적 개념을 포함하며, Möbius Interactions를 최고 차수 표현으로 포함합니다.
이 패키지는 여러 실제 ML 도메인에 걸친 11개의 벤치마크 게임과 2,042개의 구성에 걸친 사전 계산된 SI ground-truth를 포함하는 벤치마킹 스위트를 제공합니다.
Vision Transformers, 언어 모델, XGBoost, 및 LightGBM(TreeSHAP-IQ)에 대해 any-order 특징 상호작용의 설명과 시각화를 가능하게 합니다.
shapiq는 특징 기여도 이상의 Shapley 기반 설명으로 확장되어 ML 게임 이론 및 SI 응용 분야의 연구를 지원합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.