QUICK REVIEW

[论文解读] Robust and Computationally Efficient Linear Contextual Bandits under Adversarial Corruption and Heavy-Tailed Noise

Naoto Tani, Futoshi Futami|arXiv (Cornell University)|Mar 16, 2026

Advanced Bandit Algorithms Research被引用 0

一句话总结

直接回答摘要：给出一种基于在线镜像下降的高效在线算法（CR-Hvt-UCB），用于线性情境赌博，在对抗性腐蚀和具有有界(1+ε)矩的重尾噪声下也能鲁棒，更新为O(1)每轮，遗憾界为子线性。

ABSTRACT

We study linear contextual bandits under adversarial corruption and heavy-tailed noise with finite $(1+ε)$-th moments for some $ε\in (0,1]$. Existing work that addresses both adversarial corruption and heavy-tailed noise relies on a finite variance (i.e., finite second-moment) assumption and suffers from computational inefficiency. We propose a computationally efficient algorithm based on online mirror descent that achieves robustness to both adversarial corruption and heavy-tailed noise. While the existing algorithm incurs $\mathcal{O}(t\log T)$ computational cost, our algorithm reduces this to $\mathcal{O}(1)$ per round. We establish an additive regret bound consisting of a term depending on the $(1+ε)$-moment bound of the noise and a term depending on the total amount of corruption. In particular, when $ε= 1$, our result recovers existing guarantees under finite-variance assumptions. When no corruption is present, it matches the best-known rates for linear contextual bandits with heavy-tailed noise. Moreover, the algorithm requires no prior knowledge of the noise moment bound or the total amount of corruption and still guarantees sublinear regret.

研究动机与目标

说明在对抗性腐蚀和重尾噪声下，线性情境赌博中鲁棒学习的需求。
开发一个计算高效的算法，在有界(1+ε)-矩噪声下保持鲁棒。
给出适应未知腐蚀与矩界的遗憾保证。
将先前的有限方差结果推广到有界(1+ε)-矩的设定。

提出的方法

引入基于在线镜像下降（OMD）的CR-Hvt-UCB更新。
使用基于Huber的损失，配合自适应尺度σ_t和阈值τ_t以控制腐蚀和重尾。
用数据驱动更新定义V_t，通过1/σ_t^2对观测进行加权，限制腐蚀影响。
采用每轮的OMD步骤并给出闭式的两步表示以提高效率。
采用类似UCB的臂选择，利用分析得到的置信半径β_t。

实验结果

研究问题

RQ1线性情境赌博在有界(1+ε)-矩假设下，是否能对抗性腐蚀和重尾噪声同时鲁棒？
RQ2在这两种挑战下，是否能实现O(1)的每轮计算成本并保持子线性遗憾？
RQ3未知的腐蚀水平C和未知的矩界ν_t如何影响遗憾保证？
RQ4所提方法与现有的有限方差或单挑战方法有何关系和推广？

主要发现

Paper	C-Robust	HT-Robust	Efficiency	Regret
Abbasi-Yadkori 等人 (2011)	否	否	O(1)	~O(d√T)
Zhang 等人 (2025)	否	否	O(1)	~O(d√T)
He 等人 (2022)	是	否	O(1)	~O(d√T + dC)
Wang 等人 (2025)	否	是	O(1)	~O(dT^{(1-ε)/(2(1+ε))}√(∑ν_t^2) + dT^{(1-ε)/(2(1+ε))})
Yu 等人 (2025)	是	ε=1 only	O(t log T)	~O(d√(∑ν_t^2) + d·1∨C)
Our work	是	是	O(1)	~O(dT^{(1-ε)/(2(1+ε))}√(∑ν_t^2) + dT^{(1-ε)/(2(1+ε))}·1∨C)

引入CR-Hvt-UCB，在有界(1+ε)-矩的条件下对对抗性腐蚀和重尾噪声具鲁棒性。
每轮计算为O(1)，优于需要O(t log T)更新的先前方法。
遗憾界与∑ν_t^2的平方根及总腐蚀C的线性项相关，当ε=1时可回退到有限方差结果，当C=0时匹配未被腐蚀的重尾速率。
即使C和/或ν_t未知，只要将上界代入σ_t即可获得遗憾保证（对应推论）。
当腐蚀增长为C = O(√T)时，界与未腐蚀速率相符到常数因子，并与已知的重尾最佳结果一致。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。