Skip to main content
QUICK REVIEW

[论文解读] Robust and Computationally Efficient Linear Contextual Bandits under Adversarial Corruption and Heavy-Tailed Noise

Naoto Tani, Futoshi Futami|arXiv (Cornell University)|Mar 16, 2026
Advanced Bandit Algorithms Research被引用 0
一句话总结

直接回答摘要:给出一种基于在线镜像下降的高效在线算法(CR-Hvt-UCB),用于线性情境赌博,在对抗性腐蚀和具有有界(1+ε)矩的重尾噪声下也能鲁棒,更新为O(1)每轮,遗憾界为子线性。

ABSTRACT

We study linear contextual bandits under adversarial corruption and heavy-tailed noise with finite $(1+ε)$-th moments for some $ε\in (0,1]$. Existing work that addresses both adversarial corruption and heavy-tailed noise relies on a finite variance (i.e., finite second-moment) assumption and suffers from computational inefficiency. We propose a computationally efficient algorithm based on online mirror descent that achieves robustness to both adversarial corruption and heavy-tailed noise. While the existing algorithm incurs $\mathcal{O}(t\log T)$ computational cost, our algorithm reduces this to $\mathcal{O}(1)$ per round. We establish an additive regret bound consisting of a term depending on the $(1+ε)$-moment bound of the noise and a term depending on the total amount of corruption. In particular, when $ε= 1$, our result recovers existing guarantees under finite-variance assumptions. When no corruption is present, it matches the best-known rates for linear contextual bandits with heavy-tailed noise. Moreover, the algorithm requires no prior knowledge of the noise moment bound or the total amount of corruption and still guarantees sublinear regret.

研究动机与目标

  • 说明在对抗性腐蚀和重尾噪声下,线性情境赌博中鲁棒学习的需求。
  • 开发一个计算高效的算法,在有界(1+ε)-矩噪声下保持鲁棒。
  • 给出适应未知腐蚀与矩界的遗憾保证。
  • 将先前的有限方差结果推广到有界(1+ε)-矩的设定。

提出的方法

  • 引入基于在线镜像下降(OMD)的CR-Hvt-UCB更新。
  • 使用基于Huber的损失,配合自适应尺度σ_t和阈值τ_t以控制腐蚀和重尾。
  • 用数据驱动更新定义V_t,通过1/σ_t^2对观测进行加权,限制腐蚀影响。
  • 采用每轮的OMD步骤并给出闭式的两步表示以提高效率。
  • 采用类似UCB的臂选择,利用分析得到的置信半径β_t。
(a) Regret ( $\epsilon=1$ )
(a) Regret ( $\epsilon=1$ )

实验结果

研究问题

  • RQ1线性情境赌博在有界(1+ε)-矩假设下,是否能对抗性腐蚀和重尾噪声同时鲁棒?
  • RQ2在这两种挑战下,是否能实现O(1)的每轮计算成本并保持子线性遗憾?
  • RQ3未知的腐蚀水平C和未知的矩界ν_t如何影响遗憾保证?
  • RQ4所提方法与现有的有限方差或单挑战方法有何关系和推广?

主要发现

PaperC-RobustHT-RobustEfficiencyRegret
Abbasi-Yadkori 等人 (2011)O(1)~O(d√T)
Zhang 等人 (2025)O(1)~O(d√T)
He 等人 (2022)O(1)~O(d√T + dC)
Wang 等人 (2025)O(1)~O(dT^{(1-ε)/(2(1+ε))}√(∑ν_t^2) + dT^{(1-ε)/(2(1+ε))})
Yu 等人 (2025)ε=1 onlyO(t log T)~O(d√(∑ν_t^2) + d·1∨C)
Our workO(1)~O(dT^{(1-ε)/(2(1+ε))}√(∑ν_t^2) + dT^{(1-ε)/(2(1+ε))}·1∨C)
  • 引入CR-Hvt-UCB,在有界(1+ε)-矩的条件下对对抗性腐蚀和重尾噪声具鲁棒性。
  • 每轮计算为O(1),优于需要O(t log T)更新的先前方法。
  • 遗憾界与∑ν_t^2的平方根及总腐蚀C的线性项相关,当ε=1时可回退到有限方差结果,当C=0时匹配未被腐蚀的重尾速率。
  • 即使C和/或ν_t未知,只要将上界代入σ_t即可获得遗憾保证(对应推论)。
  • 当腐蚀增长为C = O(√T)时,界与未腐蚀速率相符到常数因子,并与已知的重尾最佳结果一致。
(b) Runtime ( $\epsilon=1$ )
(b) Runtime ( $\epsilon=1$ )

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。