QUICK REVIEW

[论文解读] New Metrics for Learning Evaluation in Digital Education Platforms

Gabriel Leitão, Juan G. Colonna|arXiv (Cornell University)|Jun 25, 2020

Online Learning and Analytics被引用 5

一句话总结

本文提出了一套针对数字教育平台的新型学习评估指标，超越传统的对错评分，整合了响应行为、信心水平、任务时间及认知负荷等因素。在一项高中评估中的应用表明，这些指标——尤其是综合指标如保证度（Assurance Degree）和问卷理解度（Questionnaire Comprehension Level）——能有效识别理解力弱且信心不足的学生，从而实现有针对性的干预。

ABSTRACT

Technology applied in education can provide great benefits and overcome challenges by facilitating access to learning objects anywhere and anytime. However, technology alone is not enough, since it requires suitable planning and learning methodologies. Using technology can be problematic, especially in determining whether learning has occurred or not. Futhermore, if learning has not occured, technology can make it difficult to determine how to mitigate this lack of learning. This paper presents a set of new metrics for measuring student's acquired understanding of a content in technology-based education platforms. Some metrics were taken from the literature "as is", some were modified slighty, while others were added. The hypothesis is that we should not only focus on traditional scoring, because it only counts the number of hits/errors and does not consider any other aspect of learning. We applied all metrics to an assessment conducted in a high school class in which we show specific cases, along with metrics, where very useful information can be obtained from by combining several metrics. We conclude that the proposed metrics are promising for measuring student's acquired understanding of a content, as well as for teachers to measure student's weaknesses.

研究动机与目标

解决传统评估方法仅依赖正确/错误答案而无法捕捉学生理解程度、信心水平或认知努力的局限性。
开发并实施反映技术增强学习环境中更深层次学习维度（如怀疑、响应时间、保证度）的新指标。
使教师能够基于学生响应模式的数据驱动洞察，识别处于风险中的学生，并优先安排补救教学内容。
通过整合行为与认知指标到学习分析中，改进形成性与总结性评估，实现更优的反馈与干预策略。

提出的方法

提出并实现五项独立指标：加权得分（Weighted Score, WS）、问题怀疑度（Question Doubt, QD）、保证度（Assurance Degree, AD）、学生响应时间（Student Response Time, SRT）和紊乱度（Level of Disorder, D），各项均基于学生交互数据计算。
开发两项综合指标：基于问题难度、响应时间与正确性的问题理解度（Question Comprehension Level, QCL）；以及结合QCL与AD的问卷理解度（Questionnaire Comprehension Level, QuCL）。
引入优先级（Priority, P）指标，根据学生表现与信心水平对知识点进行排序，以确定复习优先级。
将上述指标应用于一项包含40道多项选择题的真实世界评估，对象为巴西一所高中的33名十年级学生。
基于AD与QuCL进行聚类分析，将学生划分为四个表现象限，其中第三象限代表理解力弱且信心不足。
利用平台实时记录数据的能力，如答案修改、每题耗时、预期与实际响应时间，以计算各项指标。

实验结果

研究问题

RQ1结合响应行为与信心水平的学习评估指标，是否能提升对学生理解程度的识别能力，超越传统评分方法？
RQ2像问题怀疑度、保证度与响应时间等指标，如何揭示对错答案无法捕捉的隐藏学习困难？
RQ3综合指标如问卷理解度与优先级，在识别需要干预的学生及指导主题优先排序方面，其有效性如何？
RQ4基于保证度与问卷理解度对学生进行聚类，能否揭示出可指导针对性教学策略的显著学习特征？

主要发现

加权得分（WS）指标识别出那些虽答错但具备部分理解的学生，提供了比传统评分更细致的学习图景。
AD与QuCL聚类中第三象限（理解力弱且信心不足）的学生在所有指标上均表现欠佳，表明其亟需针对性支持。
问题怀疑度（QD）指标显示，频繁修改答案与低信心水平及更高认知负荷相关，尤其在难题上更为明显。
学生响应时间（SRT）分析表明，长时间响应与理解力弱及高度困惑密切相关，尤其在高难度题目上表现显著。
优先级（P）指标成功依据学生表现与信心水平对知识点进行排序，使教师能够聚焦于最需复习的关键领域。
当结合AD使用时，综合指标QuCL提供了一种稳健方法，用于识别处于风险中的学生，其中15%的学生属于低理解力/低信心类别，需立即关注。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。