Skip to main content
QUICK REVIEW

[Paper Review] Abductive Commonsense Reasoning

Chandra Bhagavatula, Ronan Le Bras|arXiv (Cornell University)|Aug 15, 2019
Topic Modeling46 references34 citations
TL;DR

This paper introduces the ART dataset for abductive commonsense reasoning and defines Abductive Natural Language Inference (alpha NLI) and Abductive Natural Language Generation (alpha NLG). It evaluates strong baselines, shows a large gap to human performance, and analyzes model limitations and transfer learning potential.

ABSTRACT

Abductive reasoning is inference to the most plausible explanation. For example, if Jenny finds her house in a mess when she returns from work, and remembers that she left a window open, she can hypothesize that a thief broke into her house and caused the mess, as the most plausible explanation. While abduction has long been considered to be at the core of how people interpret and read between the lines in natural language (Hobbs et al., 1988), there has been relatively little research in support of abductive natural language inference and generation. We present the first study that investigates the viability of language-based abductive reasoning. We introduce a challenge dataset, ART, that consists of over 20k commonsense narrative contexts and 200k explanations. Based on this dataset, we conceptualize two new tasks -- (i) Abductive NLI: a multiple-choice question answering task for choosing the more likely explanation, and (ii) Abductive NLG: a conditional generation task for explaining given observations in natural language. On Abductive NLI, the best model achieves 68.9% accuracy, well below human performance of 91.4%. On Abductive NLG, the current best language generators struggle even more, as they lack reasoning capabilities that are trivial for humans. Our analysis leads to new insights into the types of reasoning that deep pre-trained language models fail to perform--despite their strong performance on the related but more narrowly defined task of entailment NLI--pointing to interesting avenues for future research.

Motivation & Objective

  • Motivate abductive reasoning as a core aspect of human commonsense interpretation.
  • Create a large-scale dataset (ART) of narrative contexts with plausible explanations.
  • Define two new tasks: abductive natural language inference (alpha NLI) and generation (alpha NLG).
  • Provide strong baselines using state-of-the-art NLI models and language generators to establish a benchmark.

Proposed method

  • Define alpha NLI as a binary, multiple-choice task selecting the most plausible hypothesis given O1 and O2.
  • Propose probabilistic models (fully connected, linear chain, dependencies) to capture how O1, O2, and H relate.
  • Model alpha NLG as conditional generation of h+ given O1, O2, with optional background knowledge from COMeT / ATOMIC.
  • Construct ART by pairing ROCStories narratives with crowdsourced plausible/implausible hypotheses and adversarial filtering to minimize artifacts.
  • Evaluate baselines using BERT-based classifiers for alpha NLI and GPT2-based generators for alpha NLG; analyze with human baselines.

Experimental results

Research questions

  • RQ1Can language models perform abductive reasoning over narrative observations better than chance or simple entailment baselines?
  • RQ2What are the limitations of current pre-trained language models in abductive reasoning across different commonsense categories?
  • RQ3Does incorporating structured commonsense knowledge (e.g., COMeT / ATOMIC) improve abductive generation and inference?
  • RQ4Can training on ART improve performance on other commonsense tasks through transfer learning?

Key findings

  • Best alpha NLI baseline (BERT-based fully connected) achieves 68.9% accuracy, far below human 91.4%.
  • Humans outperform models across all evaluated categories; simple entailment baselines perform near chance on ART.
  • Alpha NLG is significantly harder; best generators reach about 45% vs human 96% on held-out hypotheses.
  • Adversarial filtering and model architecture (fully connected vs. linear chain) impact performance, with fully connected often performing better against strong baselines.
  • ART enables transfer learning benefits to smaller target datasets (e.g., WinoGrande, WSC, DPR, Hellaswag) when pre-trained on ART, especially with limited target data.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.