QUICK REVIEW

[Paper Review] Contrastive Explanations for Reinforcement Learning in terms of Expected Consequences

Jasper van der Waa, Jurriaan van Diggelen|arXiv (Cornell University)|Jul 23, 2018

Explainable Artificial Intelligence (XAI)15 references61 citations

TL;DR

The paper proposes a method for RL agents to explain behavior via expected consequences, using contrastive queries between the learned policy and a user-specified foil. It translates states/actions into user-friendly concepts and demonstrates with a pilot user study that policy-focused explanations are preferred.

ABSTRACT

Machine Learning models become increasingly proficient in complex tasks. However, even for experts in the field, it can be difficult to understand what the model learned. This hampers trust and acceptance, and it obstructs the possibility to correct the model. There is therefore a need for transparency of machine learning models. The development of transparent classification models has received much attention, but there are few developments for achieving transparent Reinforcement Learning (RL) models. In this study we propose a method that enables a RL agent to explain its behavior in terms of the expected consequences of state transitions and outcomes. First, we define a translation of states and actions to a description that is easier to understand for human users. Second, we developed a procedure that enables the agent to obtain the consequences of a single action, as well as its entire policy. The method calculates contrasts between the consequences of a policy derived from a user query, and of the learned policy of the agent. Third, a format for generating explanations was constructed. A pilot survey study was conducted to explore preferences of users for different explanation properties. Results indicate that human users tend to favor explanations about policy rather than about single actions.

Motivation & Objective

Motivate the need for transparent RL explanations and address the gap in XAI for RL.
Propose a method to explain RL behavior via expected state transitions and outcomes.
Translate low-level RL features into user-friendly concepts for explanations.
Enable contrastive explanations by comparing the learned policy with a foil policy.
Evaluate user preferences for explanation types through a pilot study.

Proposed method

Define an interpretable MDP by translating states to concepts C and actions to outcomes O via k and t.
Simulate consequences of the learned policy πt and a foil policy πf using a transition model T to obtain policy-level explanations.
Construct a foil policy πf by combining a user-question-based reward QI with the learned Qt to form Qf and derive πf.
Train QI through simulation with rewards designed to favor queried actions, incorporating distance-based weighting w(s_i, s_t).
Translate trajectories γ(s_t, π) into paths Path(s_t, π) using k and t to present concise explanations.
Generate contrastive explanations by comparing Path(s_t, πt) and Path(s_t, πf) via relative complement and symmetric difference.

Experimental results

Research questions

RQ1How can RL policies be explained in terms of their expected consequences rather than raw actions or rewards?
RQ2Can a contrastive explanation framework—comparing the learned policy with a user-specified foil—improve human understanding of RL behavior?
RQ3What translation of states/actions into human-friendly concepts best supports explanation quality?
RQ4Are policy-level explanations preferred over single-action explanations by users?

Key findings

The method enables explanations based on simulated consequences of policies rather than raw state-action data.
Users in the pilot study preferred explanations about policies (strategies) rather than single actions.
A contrastive explanation framework can be generated by constructing a foil policy that follows user queries while staying anchored to the learned policy.
The user study involved 82 participants and examined preferences for explanation properties such as length, information level, and focus on actions vs. policies.
Explanations that provide ample information and address strategy/policy were favored.
The approach demonstrates feasibility for translating RL explanations into human-interpretable concepts.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.