QUICK REVIEW

[论文解读] Simplify and Robustify Negative Sampling for Implicit Collaborative Filtering

Jingtao Ding, Yuhan Quan|arXiv (Cornell University)|Sep 7, 2020

Video Surveillance and Tracking Methods参考文献 46被引用 59

一句话总结

本文引入 SRNS，一种基于内存、考虑方差的负采样方法，针对隐式协同过滤的鲁棒性对抗假阳性并提升效率。它在合成数据和真实数据集上显示 SRNS 超越基线。

ABSTRACT

Negative sampling approaches are prevalent in implicit collaborative filtering for obtaining negative labels from massive unlabeled data. As two major concerns in negative sampling, efficiency and effectiveness are still not fully achieved by recent works that use complicate structures and overlook risk of false negative instances. In this paper, we first provide a novel understanding of negative instances by empirically observing that only a few instances are potentially important for model learning, and false negatives tend to have stable predictions over many training iterations. Above findings motivate us to simplify the model by sampling from designed memory that only stores a few important candidates and, more importantly, tackle the untouched false negative problem by favouring high-variance samples stored in memory, which achieves efficient sampling of true negatives with high-quality. Empirical results on two synthetic datasets and three real-world datasets demonstrate both robustness and superiorities of our negative sampling method.

研究动机与目标

在标签不完整和存在假阴性的情况下，推动隐式 CF 的鲁棒且高效的负采样研究。
显示少量高潜力负样本的内存就足以实现有效学习。
提出一个两步采样方案，将基于分数的内存更新与方差基挑选相结合。
在合成数据与真实数据集上演示 SRNS 的鲁棒性和优越性能。

提出的方法

使用每用户内存 M_u 来存储 S1 个高潜力负样本候选项。
通过与均匀采样的候选项合并并通过分数（温度 τ）的 softmax 重新抽取 S1 个新的困难负样本来更新 M_u。
引入一个基于方差的采样准则，选择具有较高被预测为真实负样本概率的负样本并加上一个缩放的方差项（alpha_t * std）。
采用对 alpha_t 的热启动调度，以逐步强调方差基采样。
在训练过程中利用自举（记忆化）来识别假阴性并据此调整采样。

实验结果

研究问题

RQ1一个基于内存的负采样器是否能高效捕捉真实负样本的动态分布？
RQ2如何可靠地衡量负样本质量以缓解假阴性？
RQ3方差感知的采样策略是否在鲁棒性和性能上优于传统的困难负样本或均匀采样？
RQ4是否存在将方差引入采样的有益训练调度（热启动）？

主要发现

Dataset	Method	N@1	N@3	R@3
Movielens-1m	ENMF	0.1846	0.3021	0.3882
Movielens-1m	Uniform	0.1744	0.2846	0.3663
Movielens-1m	NNCF	0.0829	0.1478	0.1971
Movielens-1m	AOBPR	0.1802	0.2905	0.3728
Movielens-1m	IRGAN	0.1755	0.2877	0.3708
Movielens-1m	RNS-AS	0.1823	0.2932	0.3754
Movielens-1m	AdvIR	0.1790	0.2941	0.3792
Movielens-1m	SRNS	0.1933	0.3070	0.3912
Pinterest	ENMF	0.2594	0.4144	0.5284
Pinterest	Uniform	0.2586	0.4136	0.5276
Pinterest	NNCF	0.2292	0.3699	0.4735
Pinterest	AOBPR	0.2596	0.4165	0.5319
Pinterest	IRGAN	0.2587	0.4143	0.5282
Pinterest	RNS-AS	0.2690	0.4233	0.5359
Pinterest	AdvIR	0.2689	0.4235	0.5363
Pinterest	SRNS	0.2891	0.4391	0.5486
Ecommerce	ENMF	0.1317	0.2095	0.2670
Ecommerce	Uniform	0.1265	0.2057	0.2640
Ecommerce	NNCF	0.0833	0.1420	0.1855
Ecommerce	AOBPR	0.1293	0.2108	0.2710
Ecommerce	IRGAN	0.1275	0.2065	0.2648
Ecommerce	RNS-AS	0.1335	0.2131	0.2714
Ecommerce	AdvIR	0.1357	0.2141	0.2719
Ecommerce	SRNS	0.1471	0.2256	0.2833

相较于基线，SRNS 在多组数据集上始终提升了 NDCG@1 和 NDCG@3。
方差基采样对假阴性具有鲁棒性，在嘈杂监督下尤其优于仅考虑难度的策略。
SRNS 收敛更快且比基于 GAN 的负采样方法更稳定。
SRNS 的改进在 GMF 和 MLP 评分函数下均可观察，表明其多样性。
在真实数据集上，SRNS 相对于第二好基线在 NDCG@1 上实现高达 8.40% 的相对提升。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。