QUICK REVIEW

[논문 리뷰] Bridging the Reproducibility Divide: Open Source Software's Role in Standardizing Healthcare AI

John Wu, Zhenbang Wu|arXiv (Cornell University)|2026. 03. 02.

Artificial Intelligence in Healthcare and Education인용 수 0

한 줄 요약

이 논문은 의료 분야 AI의 재현성(reproducibility)을 분석하고, 민간 데이터의 과다한 사용과 코드 공유의 제한을 드러내며, 신뢰, 안전성, 영향력을 높이기 위한 오픈 소스 실천과 벤치마크를 제안한다.

ABSTRACT

Our analysis of recent AI4H publications reveals that, despite a trend toward utilizing open datasets and sharing modeling code, 74% of AI4H papers still rely on private datasets or do not share their code. This is especially concerning in healthcare applications, where trust is essential. Furthermore, inconsistent and poorly documented data preprocessing pipelines result in variable model performance reports, even for identical tasks and datasets, making it challenging to evaluate the true effectiveness of AI models. Despite the challenges posed by the reproducibility crisis, addressing these issues through open practices offers substantial benefits. For instance, while the reproducibility mandate adds extra effort to research and publication, it significantly enhances the impact of the work. Our analysis shows that papers that used both public datasets and shared code received, on average, 110% more citations than those that do neither--more than doubling the citation count. Given the clear benefits of enhancing reproducibility, it is imperative for the AI4H community to take concrete steps to overcome existing barriers. The community should promote open science practices, establish standardized guidelines for data preprocessing, and develop robust benchmarks. Tackling these challenges through open-source development can improve reproducibility, which is essential for ensuring that AI models are safe, effective, and beneficial for patient care. This approach will help build more trustworthy AI systems that can be integrated into healthcare settings, ultimately contributing to better patient outcomes and advancing the field of medicine.

연구 동기 및 목표

2024년 현재 의료 AI(AI4H)의 재현성 현황을 평가한다.
AI4H 논문에서 민간 데이터셋 의존도와 코드 공유 부재를 정량화한다.
재현성 실천과 학술 영향력(인용 수) 간의 관계를 평가한다.
AI4H 재현성과 투명성을 개선하기 위한 구체적인 오픈 소스 및 벤치마킹 전략을 제안한다.

제안 방법

CHIL, ML4H, MLHC 및 PubMed(2018–2024)에서 대규모 AI4H 논문 코퍼스를 수집한다.
키워드, PubMed 데이터, 의학적으로 미세조정된 언어 모델을 활용하여 공개 데이터셋 사용, 코드 공유 및 주제 분류를 위한 자동 탐지기를 개발한다.
무작위 표본(30편)의 수동 검토를 통해 자동 탐지를 검증하고 정확도 지표를 보고한다.
저널/학회, 주제, 소속별 추세를 분석하고 재현성 신호를 인용 수와 상관시킨다.

실험 결과

연구 질문

RQ1AI4H 논문의 기술적 재현성 현재 상태는 어떤가(민간 데이터, 코드 공유, 데이터 전처리 표준화)?
RQ2재현성 실천(공개 데이터 사용 및 코드 공유)이 더 높은 인용 영향력과 상관이 있는가?
RQ3AI4H에서 재현성을 저해하는 장벽은 무엇이며 이를 완화할 수 있는 오픈 소스 실천은 무엇인가?
RQ4표준화 노력(예: OMOP-CDM, MEDS)이 AI4H의 재현성과 어떤 관련이 있는가?
RQ5AI4H의 재현성을 촉진할 구체적인 오픈 소스 도구, 벤치마크 및 정책은 무엇인가?

주요 결과

AI4H 논문 중 74%가 민간 데이터셋에 의존하거나 코드를 공유하지 않는다.
공개 데이터셋과 공유 코드를 모두 사용하는 논문은 어느 쪽도 사용하지 않는 논문에 비해 평균적으로 인용 수가 110% 더 많다.
2018–2024년 사이 데이터셋 사용의 약 65–75%가 민간 데이터셋이 차지한다; AI4H 컨퍼런스는 PubMed보다 공개 데이터셋을 더 많이 사용한다(대략 60–70% vs 25%).
코드 공유는 대회(컨퍼런스) 장소에서 PubMed 논문보다 높고, PubMed 기사에서는 코드 공유가 20% 미만으로 나타난다.
공개 데이터셋을 언급하고 코드를 공유하는 논문은 향후 인용 수가 더 높은 경향이 있으며, 코드를 공유하는 것은 주제와 소속에 관계없이 인용 수와 양의 상관관계를 보인다.
데이터 전처리 표준화는 제한적이며 OMOP-CDM 및 MEDS의 채택은 불완전하다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.