QUICK REVIEW

[논문 리뷰] Reproducibility in Machine Learning-based Research: Overview, Barriers and Drivers

Harald Semmelrock, Tony Ross‐Hellauer|arXiv (Cornell University)|2024. 06. 20.

Artificial Intelligence in Healthcare and Education인용 수 10

한 줄 요약

ML 재현성의 장애물을 식별하는 비판적 고찰(설명, 코드, 데이터, 실험)과 기술적, 절차적, 인식 주도 요인에 대해 논의하고, 개선을 안내하는 Drivers-Barriers-Matrix를 제시한다.

ABSTRACT

Many research fields are currently reckoning with issues of poor levels of reproducibility. Some label it a "crisis", and research employing or building Machine Learning (ML) models is no exception. Issues including lack of transparency, data or code, poor adherence to standards, and the sensitivity of ML training conditions mean that many papers are not even reproducible in principle. Where they are, though, reproducibility experiments have found worryingly low degrees of similarity with original results. Despite previous appeals from ML researchers on this topic and various initiatives from conference reproducibility tracks to the ACM's new Emerging Interest Group on Reproducibility and Replicability, we contend that the general community continues to take this issue too lightly. Poor reproducibility threatens trust in and integrity of research results. Therefore, in this article, we lay out a new perspective on the key barriers and drivers (both procedural and technical) to increased reproducibility at various levels (methods, code, data, and experiments). We then map the drivers to the barriers to give concrete advice for strategies for researchers to mitigate reproducibility issues in their own work, to lay out key areas where further research is needed in specific areas, and to further ignite discussion on the threat presented by these urgent issues.

연구 동기 및 목표

설명, 코드, 데이터, 및 실험 유형 전반에서 ML 연구의 재현성 정의를 명확히 하고 통일한다.
컴퓨터 과학(CS)과 생의학 맥락에서 ML 재현성의 장애물을 식별하고 분류한다.
ML 재현성을 향상시킬 수 있는 드라이버를 식별하고 분류하여 장애물에 매핑한다.
재현성 솔루션 채택을 위한 의사결정을 지원하기 위한 시각적 Drivers-Barriers-Matrix를 제안한다.

제안 방법

ML 재현성과 기존의 장애물/드라이버에 대한 문헌을 검토하고 합성한다.
장애물을 네 가지 재현성 유형(R1 Description, R2 Code, R3 Data, R4 Experiment)으로 분류한다.
드라이버를 기술 기반, 절차 기반, 인식/교육 카테고리로 분류한다.
향상 가능성을 평가하기 위해 드라이버를 장애물에 매핑한다.
관계성을 시각화하고 소통하기 위해 Drivers-Barriers-Matrix를 도입한다.

Figure 1 : Types of reproducibility . Adapted from Gundersen [ 26 ] .

실험 결과

연구 질문

RQ1설명, 코드, 데이터, 그리고 실험 전반에 걸친 ML 주도 연구에서 재현성의 주요 장애물은 무엇인가?
RQ2ML 재현성을 지원하는 드라이버가 무엇이며, 이것들이 식별된 장애물에 어떻게 매핑되는가?
RQ3Drivers-Barriers-Matrix가 연구자와 기관이 재현성 개입을 결정하는 데 어떻게 도움을 줄 수 있는가?
RQ4ML 특유의 도전과제(예: 비결정성, 데이터 누출, AutoML)가 일반 재현성 문제와 어떻게 상호작용하는가?
RQ5CS와 생의학 분야에서 ML 재현성을 개선하는 데 유망한 지침, 도구, 관행은 무엇인가?

주요 결과

네 가지 재현성 유형(Description, Code, Data, Experiment)으로 분류된 ML 재현성의 아홉 가지 장애물.
Limited access to code and data, plus documentation gaps, are major impediments to repro- ducibility.
Inherent nondeterminism, environmental differences, and resource constraints significantly affect experiment reproducibility.
Privacy-preserving tech, hosting services, virtualization, and tooling/platforms can act as reproducibility enablers but have trade-offs.
Standardized datasets, evaluation methods, and guidelines/checklists (e.g., model cards, data cards) support reproducibility.
Awareness, education, and practical workflows are essential for sustained improvements in ML reproducibility.

Figure 2 : Drivers-Barriers-Matrix. We map the 9 drivers to the 9 barriers identified in this paper. The colored boxes show that a specific driver is applicable to a specific barrier.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.