QUICK REVIEW

[论文解读] CrowdGrader: Crowdsourcing the Evaluation of Homework Assignments

Luca de Alfaro, Michael Shavlovsky|arXiv (Cornell University)|Aug 24, 2013

Parental Involvement in Education参考文献 30被引用 21

一句话总结

CrowdGrader 是一个众包平台，允许学生通过基于声誉的算法协作评分家庭作业。它结合了同伴评分与与评分准确性挂钩的激励机制，在实现与教学助理（TAs）相当的评分质量的同时，通过接触多种解法提供更丰富的反馈和教育收益。

ABSTRACT

Crowdsourcing offers a practical method for ranking and scoring large amounts of items. To investigate the algorithms and incentives that can be used in crowdsourcing quality evaluations, we built CrowdGrader, a tool that lets students submit and collaboratively grade solutions to homework assignments. We present the algorithms and techniques used in CrowdGrader, and we describe our results and experience in using the tool for several computer-science assignments. CrowdGrader combines the student-provided grades into a consensus grade for each submission using a novel crowdsourcing algorithm that relies on a reputation system. The algorithm iterativerly refines inter-dependent estimates of the consensus grades, and of the grading accuracy of each student. On synthetic data, the algorithm performs better than alternatives not based on reputation. On our preliminary experimental data, the performance seems dependent on the nature of review errors, with errors that can be ascribed to the reviewer being more tractable than those arising from random external events. To provide an incentive for reviewers, the grade each student receives in an assignment is a combination of the consensus grade received by their submissions, and of a reviewing grade capturing their reviewing effort and accuracy. This incentive worked well in practice.

研究动机与目标

研究用于众包评估学生家庭作业的算法与激励机制。
确定数值评分结合排名是否比仅使用排名在同伴评估中更有效。
开发一种基于声誉的共识算法，通过加权评分者的可靠性来提高评分准确性。
设计激励机制，以鼓励高质量的同伴评审并提升学生参与度。
评估在真实课堂环境中实施同伴评分的教育与实际效益。

提出的方法

该系统使用一种新颖的迭代算法 vancouver，基于期望最大化原理，联合估计共识评分与个体评分者的声誉。
评分准确性被建模为声誉得分，该得分根据与共识评分的一致性进行迭代更新。
最终评分计算为共识评分与学生自身评审表现的加权组合。
采用非尺度不变的度量标准分配评审积分：$ \hat{r}_{j} = 1 - \sqrt{\frac{\min{\tilde{v}_{j},v_{G}}}{v_{G}}} $，其中 $ v_G $ 为参考误差水平。
教师可从众包评分中插值计算最终评分，从而对班级表现进行手动调整。
学生需同时分配数值评分并排名作业，以提高评分精度。

实验结果

研究问题

RQ1在同伴评估中，数值评分结合排名是否比仅使用排名更有效？
RQ2基于声誉的算法是否能优于简单的平均法或中位数法？
RQ3不同类型评分错误（系统性与随机性）如何影响基于声誉算法的性能？
RQ4基于评分准确性的激励机制能否提升同伴评审的质量与公平性？
RQ5带有反馈的同伴评分是否比传统的教学助理评分带来更好的教育成果？

主要发现

在合成数据上，vancouver 算法优于平均法与中位数法，尤其在评分错误为系统性而非随机性时表现更优。
在真实场景中，vancouver 在编程作业评分中与教学助理评分表现相当，且因多位评审而提供更全面的反馈。
当评分错误为随机性（如环境不匹配导致）时，vancouver 略逊于简单平均法。
学生更重视反馈以及接触多种解法的机会，而非评分过程本身，许多学生报告通过评审同伴作业获得了学习收益。
激励机制（将最终成绩与作业质量及评审准确性挂钩）成功激励学生积极参与同伴评审。
教师可基于众包评分插值计算最终成绩，从而在保持公平性的同时实现成绩曲线的灵活调整。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。