QUICK REVIEW

[论文解读] Getting too personal(ized): The importance of feature choice in online adaptive algorithms

Zhaobin Li, Luna Yee|arXiv (Cornell University)|Sep 6, 2023

Advanced Bandit Algorithms Research参考文献 22被引用 7

一句话总结

论文研究在情境多臂赌博机（MAB）个性化中，是否将学生特征纳入会帮助或阻碍在线教育自适应系统，只有当特征确实影响哪一个版本更优秀时才有好处；若特征无关则存在偏见风险。

ABSTRACT

Digital educational technologies offer the potential to customize students' experiences and learn what works for which students, enhancing the technology as more students interact with it. We consider whether and when attempting to discover how to personalize has a cost, such as if the adaptation to personal information can delay the adoption of policies that benefit all students. We explore these issues in the context of using multi-armed bandit (MAB) algorithms to learn a policy for what version of an educational technology to present to each student, varying the relation between student characteristics and outcomes and also whether the algorithm is aware of these characteristics. Through simulations, we demonstrate that the inclusion of student characteristics for personalization can be beneficial when those characteristics are needed to learn the optimal action. In other scenarios, this inclusion decreases performance of the bandit algorithm. Moreover, including unneeded student characteristics can systematically disadvantage students with less common values for these characteristics. Our simulations do however suggest that real-time personalization will be helpful in particular real-world scenarios, and we illustrate this through case studies using existing experimental results in ASSISTments. Overall, our simulations show that adaptive personalization in educational technologies can be a double-edged sword: real-time adaptation improves student experiences in some contexts, but the slower adaptation and potentially discriminatory results mean that a more personalized model is not always beneficial.

研究动机与目标

通过情境MAB的个性化影响在线教育技术中的学生结果的评估。
在不同结果模型下，评估纳入学生特征何时会提升或降低性能。
调查特征分布不均带来的偏见与不平等风险。
将仿真结果与真实世界数据结合，讨论教育设计的实际含义。

提出的方法

使用带正则化的贝叶斯逻辑回归的情境汤普森采样来建模给定特征的奖励概率。
模拟三种结果生成模型：基线、全局最优行动、个性化最优行动。
将情境变量数量从1增至最多10以评估学习与遗憾。
在1500（注：原文为1000次试验）次试验中，考察50、250和1000名学生的学习期限（序贯时限）影响。
用ANCOVA分析性能并报告效应量和置信区间。

Figure 1: Swarm plots for the proportion of optimal actions for the two bandit types. Each point represents results from one trial with 250 students. For the universal optimal action, all scenarios show similar results; hence only scenario (1) is shown. The decreased performance of the contextual ba

实验结果

研究问题

RQ1在何种条件下，在情境MAB中包含学生特征会提升或削弱学习结果？
RQ2情境特征数量如何影响探索、学习速度和学生子群体之间的公平性？
RQ3何时个性化带来最大收益，相对于潜在的危害或歧视效应？
RQ4现实世界的特征分布如何影响个性化的收益或缺点？
RQ5ASSISTments案例研究为在实践中实施自适应个性化提供了哪些指导？

主要发现

只有当最优行动确实依赖于学生特征（个性化最优行动模型）时，情境MAB才优于非情境的方案。
包含多余特征通常会降低性能并增加探索成本，特别是在存在大量情境变量时。
在基线和全局最优行动情景中，情境个性化在早期阶段可能表现不如非情境方法。
当少数群体样本量较小时，情境个性化可能因对稀有特征值的不确定性较高而对少数群体造成不成比例的伤害。
即使特征有限，在较长的时间段内，个性化也能显著提高少数群体在个性化最优行动模型中的最优行动比例。
使用ASSISTments数据的案例研究显示个性化在真实世界中的潜在收益，强调在特征纳入方面需要情境与数据驱动的决策。

Figure 2: Average reward per student across 1–10 contextual variables for the two bandit types in the baseline model. In this model, the maximum possible expected reward is $0.6$ , and the expected reward for uniform random assignment is $0.5$ . Error bars represent 1 standard error.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。