QUICK REVIEW

[논문 리뷰] On Offline Evaluation of Recommender Systems.

Yitong Ji, Aixin Sun|arXiv (Cornell University)|2020. 10. 21.

Recommender Systems and Techniques인용 수 2

한 줄 요약

이 논문은 오프라인 추천 시스템 평가에서 전역적 시간순서를 忽시할 경우 데이터 누출이 발생하여 현실적이지 않은 성능 추정치를 초래함을 보여준다. MovieLens 데이터셋에서 BPR과 NeuMF를 사용하여, 향후 데이터에 접근할 경우 정확도가 인위적으로 향상되어 모델 비교가 불가능해지고, 더 많은 과거 데이터가 항상 성능을 향상시킨다는 가정이 흔들림을 보여준다.

ABSTRACT

In academic research, recommender models are often evaluated offline on benchmark datasets. The offline dataset is first split to train and test instances. All training instances are then modeled in a user-item interaction matrix, and supervised learning models are trained. Many such offline evaluations ignore the global timeline in the data, which leads to leakage: a model learns from future data to predict a current value, making the evaluation unrealistic. In this paper, we evaluate the impact of leakage using two widely adopted baseline models, BPR and NeuMF, on MovieLens dataset. We show that accessing to different amount of future data may improve or deteriorate a model's recommendation accuracy. That is, ignoring the global timeline in offline evaluation makes the performance among recommendation models not comparable. Our experiments also show that more historical data in training set does not necessarily lead to better recommendation accuracy. We share our understanding of these observations and highlight the importance of preserving the global timeline. We also call for a revisit of recommender system offline evaluation.

연구 동기 및 목표

오프라인 추천 시스템 평가에서 전역적 시간 순서를 忽시할 경우가 초래하는 영향을 조사하는 것.
미래 상호작용으로 인한 데이터 누출이 모델 성능 지표에 미치는 영향을 평가하는 것.
더 많은 과거 학습 데이터가 항상 추천 정확도를 향상시킨다는 가정을 도전하는 것.
오프라인 벤치마킹에서 시간순서 유지 평가 프로토콜을 홍보하는 것.

제안 방법

전역 시간순서를 유지하면서 MovieLens 데이터셋을 훈련 세트와 테스트 세트로 분할하는 것.
시간 순서가 유지된 데이터로 BPR과 NeuMF 모델을 훈련하여 실제 사용자-아이템 상호작용 시퀀스를 시뮬레이션하는 것.
다양한 양의 미래 데이터 노출에 따른 성능을 측정하여 누출 영향을 평가하는 것.
다른 시간적 분할에서의 정확도를 비교하여 미래 데이터가 예측에 미치는 영향을 분석하는 것.
시간 순서를 통제하면서 훈련 세트 크기와 추천 정확도 간의 관계를 분석하는 것.

실험 결과

연구 질문

RQ1오프라인 평가에서 전역 시간순서를 忽시할 경우 BPR과 NeuMF 모델의 성능에 어떤 영향을 미치는가?
RQ2미래 데이터에 노출될 경우 오프라인 환경에서 추천 정확도가 얼마나 향상되거나 악화되는가?
RQ3훈련 세트에 더 많은 과거 데이터를 포함할 경우 성능이 일관되게 향상되는가?
RQ4시간순서에 민감하지 않은 평가 방식은 추천 시스템 모델 간의 오해를 유발할 수 있는가?

주요 결과

오프라인 평가에서 전역 시간순서를 忽시할 경우 데이터 누출이 발생하여, 모델이 미래 상호작용에서 학습함으로써 과도하게 낙관적인 성능 추정치를 초래한다.
다른 양의 미래 데이터에 노출될 경우 모델 정확도는 데이터 분할 및 모델 아키텍처에 따라 향상되기도 하고 악화되기도 한다.
훈련 세트에 더 많은 과거 데이터가 포함되어도 추천 정확도가 반드시 향상되는 것은 아니며, 이는 오프라인 평가에서 흔히 가정되는 바를 도전한다.
평가 시 전역 시간순서를 유지하지 않을 경우 모델 간 성능 차이가 비교할 수 없게 된다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.