Skip to main content
QUICK REVIEW

[Paper Review] Deep Reinforcement Learning in System Optimization.

Ameer Haj-Ali, Nesreen K. Ahmed|arXiv (Cornell University)|Aug 4, 2019
Reinforcement Learning in Robotics3 citations
TL;DR

This paper evaluates the application of deep reinforcement learning (DRL) in system optimization, proposing a framework to assess its efficacy through metrics like efficiency, robustness, and problem formulation. It identifies when DRL is beneficial, compares it to alternatives like random search and greedy algorithms, and outlines challenges and future directions for integrating DRL into system optimization.

ABSTRACT

Many real-world systems problems require reasoning about the long term consequences of actions taken to configure and manage the system. These problems with delayed and often sequentially aggregated reward, are often inherently reinforcement learning problems and present the opportunity to leverage the recent substantial advances in deep reinforcement learning. However, in some cases, it is not clear why deep reinforcement learning is a good fit for the problem. Sometimes, it does not perform better than the state-of-the-art solutions. And in other cases, random search or greedy algorithms could outperform deep reinforcement learning. In this paper, we review, discuss, and evaluate the recent trends of using deep reinforcement learning in system optimization. We propose a set of essential metrics to guide future works in evaluating the efficacy of using deep reinforcement learning in system optimization. Our evaluation includes challenges, the types of problems, their formulation in the deep reinforcement learning setting, embedding, the model used, efficiency, and robustness. We conclude with a discussion on open challenges and potential directions for pushing further the integration of reinforcement learning in system optimization.

Motivation & Objective

  • To assess when and why deep reinforcement learning is a suitable approach for system optimization problems.
  • To identify cases where DRL underperforms compared to simpler baselines like random search or greedy algorithms.
  • To propose a standardized set of evaluation metrics—efficiency, robustness, formulation, and embedding—for assessing DRL in system optimization.
  • To analyze the challenges in formulating system optimization problems as reinforcement learning tasks.
  • To guide future research by identifying open challenges and promising directions for DRL integration in system optimization.

Proposed method

  • Systematically reviews recent trends in applying deep reinforcement learning to system optimization problems.
  • Proposes a structured evaluation framework based on problem formulation, embedding techniques, model architecture, and performance metrics.
  • Evaluates DRL against alternative methods such as random search and greedy algorithms across multiple system optimization scenarios.
  • Analyzes the role of delayed and aggregated rewards in shaping DRL applicability and performance.
  • Emphasizes the importance of robustness and efficiency in real-world deployment of DRL-based system optimization.
  • Uses empirical evaluation across diverse system optimization problems to compare DRL with state-of-the-art non-DRL solutions.

Experimental results

Research questions

  • RQ1In which system optimization problems does deep reinforcement learning outperform traditional methods like greedy algorithms or random search?
  • RQ2What are the key factors that determine whether DRL is a suitable choice for a given system optimization problem?
  • RQ3How can the performance of DRL in system optimization be systematically evaluated and compared to non-DRL baselines?
  • RQ4What are the critical challenges in formulating system optimization tasks as reinforcement learning problems?
  • RQ5What metrics are most effective for assessing the robustness and efficiency of DRL-based system optimization solutions?

Key findings

  • Deep reinforcement learning does not consistently outperform simpler baselines such as random search or greedy algorithms in system optimization tasks.
  • The performance of DRL is highly dependent on proper problem formulation, embedding, and model design, which significantly affect outcomes.
  • In some cases, the complexity of DRL training outweighs its benefits, especially when rewards are sparse or delayed.
  • Robustness and training efficiency are critical but often under-evaluated aspects of DRL in system optimization.
  • The proposed evaluation metrics provide a structured way to assess DRL applicability and guide future research.
  • There remains a significant gap in understanding when DRL is truly advantageous, highlighting the need for better benchmarking and evaluation standards.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.