Skip to main content
QUICK REVIEW

[Paper Review] DisSent: Sentence Representation Learning from Explicit Discourse Relations

Allen Nie, Erin Bennett|arXiv (Cornell University)|Oct 12, 2017
Topic Modeling33 references59 citations
TL;DR

DisSent learns sentence embeddings by predicting explicit discourse markers between sentence pairs, using automatically curated data from BookCorpus and dependency parsing to train a BiLSTM encoder and fine-tune BERT, achieving strong transfer performance and state-of-the-art results on PDTB implicit relation prediction.

ABSTRACT

Learning effective representations of sentences is one of the core missions of natural language understanding. Existing models either train on a vast amount of text, or require costly, manually curated sentence relation datasets. We show that with dependency parsing and rule-based rubrics, we can curate a high quality sentence relation task by leveraging explicit discourse relations. We show that our curated dataset provides an excellent signal for learning vector representations of sentence meaning, representing relations that can only be determined when the meanings of two sentences are combined. We demonstrate that the automatically curated corpus allows a bidirectional LSTM sentence encoder to yield high quality sentence embeddings and can serve as a supervised fine-tuning dataset for larger models such as BERT. Our fixed sentence embeddings achieve high performance on a variety of transfer tasks, including SentEval, and we achieve state-of-the-art results on Penn Discourse Treebank's implicit relation prediction task.

Motivation & Objective

  • Motivate learning general-purpose sentence representations through explicit discourse relations as a structured semantic signal.
  • Automatically curate a large, high-quality dataset of sentence pairs linked by explicit discourse markers via dependency parsing.
  • Train a sentence encoder to produce embeddings that support discourse marker prediction, encouraging meaning-aware representations.
  • Fine-tune larger models (e.g., BERT) on the DisSent task to improve downstream discourse classification tasks.
  • Evaluate embeddings on SentEval and PDTB tasks to compare with state-of-the-art supervised and unsupervised approaches.

Proposed method

  • Adapt a BiLSTM sentence encoder with temporal max-pooling to produce fixed-size sentence vectors.
  • Compute pairwise interactions between sentence embeddings using subtraction, multiplication, and averaging, concatenated with the sentence embeddings.
  • Project the combined features through a fully connected layer to predict the discourse marker via softmax.
  • Automatically extract sentence pairs connected by explicit discourse markers using a dependency-parser–based pipeline with predefined dependency patterns.
  • Fine-tune BERT-base on the DisSent task by using the [CLS] representation for sentence pairs and evaluating on downstream tasks.
  • Explore multiple discourse marker subsets (ALL, Books 5, Books 8) to assess generalization and data scale effects.

Experimental results

Research questions

  • RQ1Can automated discourse marker prediction provide a strong supervisory signal for learning transferable sentence embeddings?
  • RQ2How do DisSent embeddings compare to existing supervised and unsupervised sentence representations on standard evaluation benchmarks?
  • RQ3Does fine-tuning large pretrained models (like BERT) on DisSent data yield improved performance on discourse-related classification tasks?
  • RQ4What is the impact of using different sets of discourse markers on representation quality and generalization?
  • RQ5Is explicit discourse relation supervision competitive with, or complementary to, implicit-relations and other training signals for sentence meaning learning?

Key findings

  • DisSent embeddings enable high-quality sentence representations that perform well on SentEval when used as fixed embeddings.
  • Fine-tuning BERT on DisSent yields state-of-the-art results on PDTB implicit relation prediction compared to other fine-tuning strategies.
  • DisSent-trained models outperform InferSent and SkipThought on several generalization tasks, notably on TREC (question-type classification) and implicit relation tasks.
  • Using DisSent for training provides data collection and training speed advantages over some prior supervised approaches while remaining competitive in generalization performance.
  • Discourse marker prediction as a training task yields useful supervision that captures sentence-integration meaning, enabling effective downstream classification without relying on large-scale hand annotations.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.