QUICK REVIEW

[论文解读] Learning about social learning in MOOCs: From statistical analysis to generative model

Christopher G. Brinton, Mung Chiang|arXiv (Cornell University)|Dec 8, 2013

Online Learning and Analytics参考文献 18被引用 28

一句话总结

本文提出了一种用于MOOC讨论论坛的生成模型，以应对参与度下降和信息过载问题。通过统计分析和统一的生成模型，该研究开发了一种相关性排序算法，相比tf-idf和HITS，能更有效地减少无关线程的推荐，人类评估显示，与HITS相比，无关推荐减少了50%。

ABSTRACT

We study user behavior in the courses offered by a major Massive Online Open Course (MOOC) provider during the summer of 2013. Since social learning is a key element of scalable education in MOOCs and is done via online discussion forums, our main focus is in understanding forum activities. Two salient features of MOOC forum activities drive our research: 1. High decline rate: for all courses studied, the volume of discussions in the forum declines continuously throughout the duration of the course. 2. High-volume, noisy discussions: at least 30% of the courses produce new discussion threads at rates that are infeasible for students or teaching staff to read through. Furthermore, a substantial portion of the discussions are not directly course-related. We investigate factors that correlate with the decline of activity in the online discussion forums and find effective strategies to classify threads and rank their relevance. Specifically, we use linear regression models to analyze the time series of the count data for the forum activities and make a number of observations, e.g., the teaching staff's active participation in the discussion increases the discussion volume but does not slow down the decline rate. We then propose a unified generative model for the discussion threads, which allows us both to choose efficient thread classifiers and design an effective algorithm for ranking thread relevance. Our ranking algorithm is further compared against two baseline algorithms, using human evaluation from Amazon Mechanical Turk. The authors on this paper are listed in alphabetical order. For media and press coverage, please refer to us collectively, as "researchers from the EDGE Lab at Princeton University, together with collaborators at Boston University and Microsoft Corporation."

研究动机与目标

理解MOOC讨论论坛参与度下降的驱动因素，特别是大规模课程中观察到的高下降率。
解决由高流量、嘈杂且常与课程无关的论坛讨论所导致的信息过载问题。
开发一种生成模型，以捕捉讨论线程创建与传播的动态行为，从而提升分类与排序效果。
设计一种相关性排序算法，更有效地识别与课程相关的线程，优于tf-idf和HITS等基线方法。
通过Amazon Mechanical Turk的人工评估，验证所提排序算法的有效性。

提出的方法

使用时间序列数据的线性回归对论坛发帖数量进行统计分析，以识别影响参与量和下降率的因素。
开发一种统一的讨论线程生成模型，以捕捉时间动态和内容特征。
基于生成模型设计线程分类系统，以过滤噪声和非相关的内容。
利用从生成模型中推导出的权威分，设计一种相关性排序算法，以优先推荐与课程相关的线程。
通过Amazon Mechanical Turk的人工评估，将所提算法与tf-idf和HITS基线方法进行比较，以评估顶级线程的相关性。
采用10天窗口化方法，结合预热期和查询期，以模拟动态MOOC环境中实时线程排序。

实验结果

研究问题

RQ1MOOC讨论论坛的参与度随时间的衰减速度如何？哪些行为因素影响这一衰减？
RQ2在整体活动量下降的背景下，哪些因素与持续的讨论量相关？
RQ3是否可以以一种支持有效分类与相关性排序的方式，对MOOC讨论线程的生成过程进行建模？
RQ4所提排序算法在识别相关、与课程相关的讨论方面，与tf-idf和HITS相比表现如何？
RQ5与基线方法相比，所提模型在多大程度上减少了无关线程的推荐？

主要发现

教学人员的参与虽能提升讨论量，但无法减缓论坛活动的下降速度。
与HITS基线相比，所提排序算法将推荐的无关线程数量减少了58%，差异集合中仅111条无关线程，而HITS为262条。
与tf-idf相比，所提算法在差异集合中仅推荐64条无关线程，而tf-idf推荐104条，相关性提升达40%。
生成模型成功捕捉了线程创建的关键动态行为，并实现了对讨论内容的有效过滤与排序。
通过Amazon Mechanical Turk进行的人工评估确认，所提算法在识别相关、与课程相关的讨论方面，始终优于tf-idf和HITS。
该模型表明，在存在大量噪声和非相关内容的MOOC论坛中，基于内容的相关性排序比基于热度的排序（如HITS）更为有效。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。