[论文解读] Robust and Computationally Efficient Linear Contextual Bandits under Adversarial Corruption and Heavy-Tailed Noise
直接回答摘要:给出一种基于在线镜像下降的高效在线算法(CR-Hvt-UCB),用于线性情境赌博,在对抗性腐蚀和具有有界(1+ε)矩的重尾噪声下也能鲁棒,更新为O(1)每轮,遗憾界为子线性。
We study linear contextual bandits under adversarial corruption and heavy-tailed noise with finite $(1+ε)$-th moments for some $ε\in (0,1]$. Existing work that addresses both adversarial corruption and heavy-tailed noise relies on a finite variance (i.e., finite second-moment) assumption and suffers from computational inefficiency. We propose a computationally efficient algorithm based on online mirror descent that achieves robustness to both adversarial corruption and heavy-tailed noise. While the existing algorithm incurs $\mathcal{O}(t\log T)$ computational cost, our algorithm reduces this to $\mathcal{O}(1)$ per round. We establish an additive regret bound consisting of a term depending on the $(1+ε)$-moment bound of the noise and a term depending on the total amount of corruption. In particular, when $ε= 1$, our result recovers existing guarantees under finite-variance assumptions. When no corruption is present, it matches the best-known rates for linear contextual bandits with heavy-tailed noise. Moreover, the algorithm requires no prior knowledge of the noise moment bound or the total amount of corruption and still guarantees sublinear regret.
研究动机与目标
- 说明在对抗性腐蚀和重尾噪声下,线性情境赌博中鲁棒学习的需求。
- 开发一个计算高效的算法,在有界(1+ε)-矩噪声下保持鲁棒。
- 给出适应未知腐蚀与矩界的遗憾保证。
- 将先前的有限方差结果推广到有界(1+ε)-矩的设定。
提出的方法
- 引入基于在线镜像下降(OMD)的CR-Hvt-UCB更新。
- 使用基于Huber的损失,配合自适应尺度σ_t和阈值τ_t以控制腐蚀和重尾。
- 用数据驱动更新定义V_t,通过1/σ_t^2对观测进行加权,限制腐蚀影响。
- 采用每轮的OMD步骤并给出闭式的两步表示以提高效率。
- 采用类似UCB的臂选择,利用分析得到的置信半径β_t。

实验结果
研究问题
- RQ1线性情境赌博在有界(1+ε)-矩假设下,是否能对抗性腐蚀和重尾噪声同时鲁棒?
- RQ2在这两种挑战下,是否能实现O(1)的每轮计算成本并保持子线性遗憾?
- RQ3未知的腐蚀水平C和未知的矩界ν_t如何影响遗憾保证?
- RQ4所提方法与现有的有限方差或单挑战方法有何关系和推广?
主要发现
| Paper | C-Robust | HT-Robust | Efficiency | Regret |
|---|---|---|---|---|
| Abbasi-Yadkori 等人 (2011) | 否 | 否 | O(1) | ~O(d√T) |
| Zhang 等人 (2025) | 否 | 否 | O(1) | ~O(d√T) |
| He 等人 (2022) | 是 | 否 | O(1) | ~O(d√T + dC) |
| Wang 等人 (2025) | 否 | 是 | O(1) | ~O(dT^{(1-ε)/(2(1+ε))}√(∑ν_t^2) + dT^{(1-ε)/(2(1+ε))}) |
| Yu 等人 (2025) | 是 | ε=1 only | O(t log T) | ~O(d√(∑ν_t^2) + d·1∨C) |
| Our work | 是 | 是 | O(1) | ~O(dT^{(1-ε)/(2(1+ε))}√(∑ν_t^2) + dT^{(1-ε)/(2(1+ε))}·1∨C) |
- 引入CR-Hvt-UCB,在有界(1+ε)-矩的条件下对对抗性腐蚀和重尾噪声具鲁棒性。
- 每轮计算为O(1),优于需要O(t log T)更新的先前方法。
- 遗憾界与∑ν_t^2的平方根及总腐蚀C的线性项相关,当ε=1时可回退到有限方差结果,当C=0时匹配未被腐蚀的重尾速率。
- 即使C和/或ν_t未知,只要将上界代入σ_t即可获得遗憾保证(对应推论)。
- 当腐蚀增长为C = O(√T)时,界与未腐蚀速率相符到常数因子,并与已知的重尾最佳结果一致。

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。