QUICK REVIEW

[Paper Review] Testing of Deep Reinforcement Learning Agents with Surrogate Models

Matteo Biagiola, Paolo Tonella|arXiv (Cornell University)|May 22, 2023

Reinforcement Learning in Robotics63 references7 citations

TL;DR

The paper Introduces Indago, a search-based testing approach for DRL agents that uses a surrogate environment model trained on training interactions to predict failures and guide configuration search, achieving more failures and greater diversity than state-of-the-art random sampling.

ABSTRACT

Deep Reinforcement Learning (DRL) has received a lot of attention from the research community in recent years. As the technology moves away from game playing to practical contexts, such as autonomous vehicles and robotics, it is crucial to evaluate the quality of DRL agents. In this paper, we propose a search-based approach to test such agents. Our approach, implemented in a tool called Indago, trains a classifier on failure and non-failure environment (i.e., pass) configurations resulting from the DRL training process. The classifier is used at testing time as a surrogate model for the DRL agent execution in the environment, predicting the extent to which a given environment configuration induces a failure of the DRL agent under test. The failure prediction acts as a fitness function, guiding the generation towards failure environment configurations, while saving computation time by deferring the execution of the DRL agent in the environment to those configurations that are more likely to expose failures. Experimental results show that our search-based approach finds 50% more failures of the DRL agent than state-of-the-art techniques. Moreover, such failures are, on average, 78% more diverse; similarly, the behaviors of the DRL agent induced by failure configurations are 74% more diverse.

Motivation & Objective

Motivate robust testing of DRL agents deployed in real-world contexts beyond game playing.
Leverage training-time interaction data to build a surrogate model of the environment.
Develop a search-based method to generate challenging environment configurations that induce DRL agent failures.

Proposed method

Train a surrogate classifier (or regressor) on DRL training interaction data (environment configuration, failure label).
Use the surrogate as a fitness function to guide search-based generation of new environment configurations.
Apply Hill Climbing or Genetic Algorithm to maximize predicted failures under environmental mutations while maintaining validity constraints.
Optionally seed the search from known failing configurations observed during training.
Execute the DRL agent only on the most promising configurations to save computation.

Experimental results

Research questions

RQ1Can surrogate-model guided search expose more DRL failures than state-of-the-art sampling?
RQ2Do failure configurations found by surrogate-guided search yield greater diversity in environmental factors and DRL behaviors?
RQ3How well does a classifier vs. a regressor perform as the surrogate model for guiding failure search?
RQ4What is the impact of seeding the search with known failing configurations on effectiveness?

Key findings

Indago finds about 50% more DRL failures than state-of-the-art sampling.
The failure configurations discovered by Indago are about 77% more diverse than those from sampling in terms of environment setups.
The behaviors of the DRL agent induced by Indago-generated failures are about 74% more diverse.
The approach saves computation by only executing the DRL agent on high-predicted-failure configurations.
Experimental setup includes three complex case studies: parking, walking humanoid, and self-driving car tasks.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.