QUICK REVIEW

[Paper Review] Deep reinforcement learning for search, recommendation, and online advertising: a survey

Xiangyu Zhao, Long Xia|arXiv (Cornell University)|Dec 18, 2018

Advanced Bandit Algorithms Research44 citations

TL;DR

A comprehensive survey of how deep reinforcement learning is applied to search, recommendation, and online advertising, detailing foundations, representative algorithms, applications, and future directions.

ABSTRACT

Search, recommendation, and online advertising are the three most important information-providing mechanisms on the web. These information seeking techniques, satisfying users' information needs by suggesting users personalized objects (information or services) at the appropriate time and place, play a crucial role in mitigating the information overload problem. With recent great advances in deep reinforcement learning (DRL), there have been increasing interests in developing DRL based information seeking techniques. These DRL based techniques have two key advantages -- (1) they are able to continuously update information seeking strategies according to users' real-time feedback, and (2) they can maximize the expected cumulative long-term reward from users where reward has different definitions according to information seeking applications such as click-through rate, revenue, user satisfaction and engagement. In this paper, we give an overview of deep reinforcement learning for search, recommendation, and online advertising from methodologies to applications, review representative algorithms, and discuss some appealing research directions.

Motivation & Objective

Motivate the use of reinforcement learning to address dynamic user preferences in information seeking tasks.
Summarize methodological foundations of RL in search, recommendation, and advertising, including MABs, MDPs, POMDPs, and policy learning.
Review representative DRL algorithms and systems in three domains: search, recommendation, and online advertising.
Discuss challenges and future directions to advance RL-based information seeking systems.

Proposed method

Present technical foundations of RL and problem formulations (MAB, MDP, POMDP, multi-agent settings).
Differentiate model-based vs. model-free RL and value-based vs. policy-based approaches (Q-learning, DQN, Actor-Critic).
Survey RL applications in search tasks: query understanding, ranking, whole-page optimization, session search.
Survey RL applications in recommendation: exploitation/exploration, temporal dynamics, long-term engagement, page-wise recommendation.
Survey RL applications in online advertising: guaranteed delivery and real-time bidding, including model-based and model-free approaches.

Experimental results

Research questions

RQ1How can RL frameworks model dynamic user behavior in search, recommendation, and advertising tasks?
RQ2What are the key RL algorithms and architectures that have been effective in each information-seeking domain?
RQ3What open challenges exist for applying DRL to search, recommendation, and online advertising, and what directions show promise for future work?
RQ4How can cross-domain collaboration among search, recommendation, and advertising be leveraged through RL?

Key findings

The survey outlines the core RL foundations relevant to information seeking, including MABs, MDPs, POMDPs, and multi-agent formulations.
It summarizes representative DRL methods used for ranking, diversity, session modeling, and page presentation in search and recommendation.
It reviews DRL-based strategies for online advertising, including bidding, budgeting, and cooperative/multi-agent approaches.
The paper highlights the importance of long-term rewards and user engagement as benchmarks beyond immediate clicks or impressions.
It discusses future directions such as cross-scenario collaborative RL, richer reward designs, expanded user–agent interactions, and offline/online evaluation tools.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.