QUICK REVIEW

[論文レビュー] Reproducibility of Machine Learning: Terminology, Recommendations and Open Issues

Riccardo Albertoni, Sara Colantonio|arXiv (Cornell University)|Feb 24, 2023

Explainable Artificial Intelligence (XAI)被引用数 9

ひとこと要約

AI/MLの再現性に関する包括的な調査。用語の整理、既存ガイドラインの整理、信頼できるMLを推進するための専門的な推奨を提案（生物医工学・物理系AI分野を含む）。

ABSTRACT

Reproducibility is one of the core dimensions that concur to deliver Trustworthy Artificial Intelligence. Broadly speaking, reproducibility can be defined as the possibility to reproduce the same or a similar experiment or method, thereby obtaining the same or similar results as the original scientists. It is an essential ingredient of the scientific method and crucial for gaining trust in relevant claims. A reproducibility crisis has been recently acknowledged by scientists and this seems to affect even more Artificial Intelligence and Machine Learning, due to the complexity of the models at the core of their recent successes. Notwithstanding the recent debate on Artificial Intelligence reproducibility, its practical implementation is still insufficient, also because many technical issues are overlooked. In this survey, we critically review the current literature on the topic and highlight the open issues. Our contribution is three-fold. We propose a concise terminological review of the terms coming into play. We collect and systematize existing recommendations for achieving reproducibility, putting forth the means to comply with them. We identify key elements often overlooked in modern Machine Learning and provide novel recommendations for them. We further specialize these for two critical application domains, namely the biomedical and physical artificial intelligence fields.

研究の動機と目的

再現性・再現性の繰り返し・複製性および関連概念に関する用語をML/AIで明確化・統一する。
一般的な領域からドメイン特有の文脈まで、AI/MLの再現性を達成するための既存の推奨を系統的に収集・整理する。
現在のガイドラインのギャップや不足要素を特定し、生体医用・物理系AI文脈での深層学習・MLに対する専門的推奨を提案する。
研究者と実務家が再現可能なML実験を実装するのに役立つ、実践的で実行可能なガイドラインを要約した表を作成する。

提案手法

AI/ML文献および関連分野における再現性関連の用語とガイドラインを系統的に文献調査する。
ワークフロコンポーネント、チーム、再現性研究の理由などの次元に用語をマッピングする terminological diagram を作成する。
一般的な推奨を三つの要約表に統合し、生体医用・物理系AI分野における深層学習・MLの拡張を議論する。
ML再現性の特異性に対するギャップとニーズを批判的に分析し、特に二つの重要な適用分野における専門化を検討する。
ドキュメンテーション、データ、コード、実験の透明性を向上させる新規の推奨と実践的考慮事項を提案する。

実験結果

リサーチクエスチョン

RQ1ML/AIにおける再現性関連概念の用語の区別は何で、どのように調和させることができるか。
RQ2AI/MLの再現性を達成するための現在のガイドラインとベストプラクティスは何であり、深層学習を含む現代のMLにはどこまで適用できないのか。
RQ3生体医療・物理系AI分野に適用した場合の再現性推奨のギャップは何で、どのように対処できるか。
RQ4研究者や査読者のための具体的なドキュメンテーションとワークフロー実践に推奨を落とし込むにはどうすればよいか。

主な発見

用語の乱立（再現性、複製性、反復性、ロバスト性、一般化能力など）と、それぞれの定義が文献間で異なる。
データ・コード・実験の有無に基づく再現性の度合い（例：R1–R4、bronze/silver/gold）を区別するガイドラインがある一方で、整合性に欠ける点が残る。
会議・ジャーナルのガイドラインは深さにばらつきがあり、完全な環境再現（コンテナ・VM）を求めるものもあれば、データとコードの共有のみを重視するものもある。
データシート、モデルカード、ファクトシートなど、動機、データ由来、モデルの使用、性能を文書化するためのツールと実践が提案されている。
深層ネットワーク、ハイパーパラメータ調整、ハードウェア依存性など、ML固有の複雑性に対する再現性ガイダンスの専門性の不足がある。
本論文は、生体医療・物理系AIの適用を念頭に置いた再現性ガイダンスを調整するための構造化された推奨と新しい要約表を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。