QUICK REVIEW

[論文レビュー] Open Datasets in Learning Analytics: Trends, Challenges, and Best PRACTICE

Valdemar Švábenský, Brendan Flanagan|arXiv (Cornell University)|Feb 19, 2026

Online Learning and Analytics被引用数 0

ひとこと要約

論文はLAK、EDM、AIEDの論文とともに公表された学習分析のオープDatasetを系統的に調査し、172のユニークなDatasetを特定。実践的ガイドラインと注釈付きデータ inventoryを提供し、オープンデータ実践を促進。

ABSTRACT

Open datasets play a crucial role in three research domains that intersect data science and education: learning analytics, educational data mining, and artificial intelligence in education. Researchers in these domains apply computational methods to analyze data from educational contexts, aiming to better understand and improve teaching and learning. Providing open datasets alongside research papers supports reproducibility, collaboration, and trust in research findings. It also provides individual benefits for authors, such as greater visibility, credibility, and citation potential. Despite these advantages, the availability of open datasets and the associated practices within the learning analytics research communities, especially at their flagship conference venues, remain unclear. We surveyed available datasets published alongside research papers in learning analytics. We manually examined 1,125 papers from three flagship conferences (LAK, EDM, and AIED) over the past five years. We discovered, categorized, and analyzed 172 datasets used in 204 publications. Our study presents the most comprehensive collection and analysis of open educational datasets to date, along with the most detailed categorization. Of the 172 datasets identified, 143 were not captured in any prior survey of open data in learning analytics. We provide insights into the datasets' context, analytical methods, use, and other properties. Based on this survey, we summarize the current gaps in the field. Furthermore, we list practical recommendations, advice, and 8-item guidelines under the acronym PRACTICE with a checklist to help researchers publish their data. Lastly, we share our original dataset: an annotated inventory detailing the discovered datasets and the corresponding publications. We hope these findings will support further adoption of open data practices in learning analytics communities and beyond.

研究の動機と目的

2020年から2024年の間に最先端の学習分析研究で使用されるオープンデータセットの入手可能性と特徴を評価する。
LA研究におけるオープンデータ実践を妨げるデータギャップと課題を特定する。
LAの研究者がオープンデータセットを公開・再利用するのを支援する実用的なガイダンスを開発する。
再現性と再利用のために発見されたデータセットと関連出版物の注釈付き在庫を提供する。

提案手法

PRISMAライクな流れに従ったLAデータセットの実践的システマティック調査を実施する。
2020–2024に公開された3つの旗艦LA会場（LAK、EDM、AIED）の全論文を調査する（n=1125）。
データセットを活用しアクセス可能なデータセットを提供する論文のみを含める適格性をスクリーニングする。
候補論文と候補データセット、選択論文と選択データセットを区別する。
複数著者によるクロス検証を伴い、構造化された在庫にデータセット情報を抽出・記録する。

Figure 1. PRISMA flow diagram. Generated using the tool by Haddaway et al. ( 2022 ) .

実験結果

リサーチクエスチョン

RQ1RQ1: 最先端のLA研究で利用可能なオープンデータセットは何で、それらの特徴は何か？
RQ2RQ2: LAデータセットで過小評価されている文脈や領域（研究ギャップ）は何か？
RQ3RQ3: LA研究者がオープンデータ実践を採用するのに役立つベストプラクティスのガイドラインは何か？
RQ4RQ4: 開示されたデータセット在庫と関連資料を研究者がどのように活用・再利用できるか？

主な発見

LAK、EDM、AIED（2020–2024）全体で204件の公表に対し、172のユニークなオープンデータセットを特定した。
172のデータセットのうち143は先行調査に含まれていなかったため、これまでで最も包括的なLAデータセット調査となる。
データセットの文脈、分析手法、使用法、その他の特性について洞察を提供した。
データの公開と共有を支援する実践的な推奨と8項目のPRACTICEガイドラインを提案した。
再現性を支援する注釈付きデータセット在庫と付随する資料（分析コード、構造化引用）を共有した。
プライバシー、識別除去、アクセス制御などの障壁を含む再現性とオープンサイエンスの観点を論じた。

Figure 2. Distributions of dataset frequency across educational topics and levels of students.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。