QUICK REVIEW

[论文解读] Semantics-Aware Denoising: A PLM-Guided Sample Reweighting Strategy for Robust Recommendation

Xikai Yang, Yang Wang|arXiv (Cornell University)|Feb 17, 2026

Recommender Systems and Techniques被引用 0

一句话总结

SAID 使用用户兴趣与物品内容之间的语义一致性来降低噪声隐式反馈的权重，从而在不改变主干模型的情况下提升 AUC 的鲁棒性。

ABSTRACT

Implicit feedback, such as user clicks, serves as the primary data source for modern recommender systems. However, click interactions inherently contain substantial noise, including accidental clicks, clickbait-induced interactions, and exploratory browsing behaviors that do not reflect genuine user preferences. Training recommendation models with such noisy positive samples leads to degraded prediction accuracy and unreliable recommendations. In this paper, we propose SAID (Semantics-Aware Implicit Denoising), a simple yet effective framework that leverages semantic consistency between user interests and item content to identify and downweight potentially noisy interactions. Our approach constructs textual user interest profiles from historical behaviors and computes semantic similarity with target item descriptions using pre-trained language model (PLM) based text encoders. The similarity scores are then transformed into sample weights that modulate the training loss, effectively reducing the impact of semantically inconsistent clicks. Unlike existing denoising methods that require complex auxiliary networks or multi-stage training procedures, SAID only modifies the loss function while keeping the backbone recommendation model unchanged. Extensive experiments on two real-world datasets demonstrate that SAID consistently improves recommendation performance, achieving up to 2.2% relative improvement in AUC over strong baselines, with particularly notable robustness under high noise conditions.

研究动机与目标

通过降低语义不一致的交互对鲁棒推荐的影响来应对来自嘈杂隐式反馈（点击）的挑战。
利用来自历史的用户兴趣与物品内容之间的语义对齐来降低嘈杂样本的权重。
提供一种简单、在损失函数层面的去噪方法，不改变主干模型架构。

提出的方法

从历史用户行为中构建文本化的用户兴趣画像。
使用基于 PLM 的编码器计算用户画像与目标物品描述之间的语义相似性。
将相似性分数转换为样本权重，以调节训练损失。
在现有训练目标中应用加权，而不修改推荐系统的主干。

实验结果

研究问题

RQ1用户兴趣与物品内容之间的语义一致性是否能识别噪声隐式反馈？
RQ2在不改变模型架构的前提下，基于 PLM 指导的样本再加权是否能提升在有噪声条件下的推荐性能？
RQ3与强基线相比，SAID 在高噪声条件下的表现如何？

主要发现

SAID 在强基线之上持续提升推荐性能。
在 AUC 上实现最高可达 2.2% 的相对提升。
在高噪声条件下表现出显著的鲁棒性。
不需要辅助网络或多阶段训练；仅修改损失函数。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。