QUICK REVIEW

[論文レビュー] What's documented in AI? Systematic Analysis of 32K AI Model Cards

Weixin Liang, Nazneen Fatema Rajani|arXiv (Cornell University)|Feb 7, 2024

Artificial Intelligence in Healthcare被引用数 5

ひとこと要約

本論文は Hugging Face の AI モデルカード 32,111 件を分析し、文書の品質、セクションの網羅性、詳細なモデルカードの追加がモデル利用に与える影響を評価する。セクションごとに情報量に不均一さがあり、介入後にダウンロード数が控えめに増加することを見出した。

ABSTRACT

The rapid proliferation of AI models has underscored the importance of thorough documentation, as it enables users to understand, trust, and effectively utilize these models in various applications. Although developers are encouraged to produce model cards, it's not clear how much information or what information these cards contain. In this study, we conduct a comprehensive analysis of 32,111 AI model documentations on Hugging Face, a leading platform for distributing and deploying AI models. Our investigation sheds light on the prevailing model card documentation practices. Most of the AI models with substantial downloads provide model cards, though the cards have uneven informativeness. We find that sections addressing environmental impact, limitations, and evaluation exhibit the lowest filled-out rates, while the training section is the most consistently filled-out. We analyze the content of each section to characterize practitioners' priorities. Interestingly, there are substantial discussions of data, sometimes with equal or even greater emphasis than the model itself. To evaluate the impact of model cards, we conducted an intervention study by adding detailed model cards to 42 popular models which had no or sparse model cards previously. We find that adding model cards is moderately correlated with an increase weekly download rates. Our study opens up a new perspective for analyzing community norms and practices for model documentation through large-scale data science and linguistics analysis.

研究の動機と目的

Assess how extensively AI model cards on Hugging Face are filled out across sections.
Identify which sections (e.g., Training, Environmental Impact, Limitations, Evaluation) are commonly documented and which are neglected.
Characterize practitioners' priorities via content analysis of model card sections.
Evaluate whether providing detailed model cards influences model usage (downloads).
Discuss implications for standards, transparency, and data-centric documentation in AI.

提案手法

Collected 74,970 AI model repositories on Hugging Face as of Oct 1, 2022; analyzed 32,111 models with model cards (Markdown README.md) uploaded by 6,392 accounts.
Parsed model cards to detect section presence using keyword-based pipelines (e.g., CO2 variants for Environmental Impact).
Performed content analysis of four key sections (Limitations, Uses, Evaluation, Training) using sentence-level topic modeling.
Compared top 100 vs. top 1,000 vs. population cards to examine length and completion rates.
Conducted a model card intervention study: added detailed model cards to 42 popular models with sparse/no cards and used a difference-in-differences approach to assess download changes.
Computed statistical significance and effect sizes (e.g., p-values, confidence intervals) for intervention results.

実験結果

リサーチクエスチョン

RQ1What fraction of Hugging Face AI models have model cards and how much traffic do those models account for?
RQ2Which sections of model cards are most and least filled out, and how does this vary over time and by card tier (top models)?
RQ3What themes dominate the content of key sections (Limitations, Uses, Evaluation, Training)?
RQ4Does adding detailed model cards to previously sparse models affect their weekly download rates?
RQ5What are the broader implications for documentation practices and data-centric AI research?

主な発見

44.2% of Hugging Face models have model cards, but these models account for 90.5% of total download traffic.
Environmental Impact (2.0%) and Evaluation (15.4%) and Limitations (17.4%) sections have the lowest completion rates, while Training (74.3%) is most frequently filled out.
Top 100 model cards tend to be longer and have higher completion rates for several sections (e.g., Environmental Impact 9.0%, Limitations 39.0%, Evaluation 47.0%, Citation 67.0%).
Approximately 84.8% of Environmental Impact sections are automatically generated by AI tools (e.g., AutoNLP/AutoTrain).
In the Model Card Intervention Study, Batch 2 showed a significant 29.0% increase in average weekly downloads for treated models (95% CI [10.6%, 47.5%], p=0.01); Batch 1 showed smaller, inconclusive effects likely due to Thanksgiving timing.
Overall, the study suggests a moderate positive correlation between richer model cards and model usage, though results vary by batch and external factors.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。