[Paper Review] Contextual LSTM (CLSTM) models for Large scale NLP tasks
CLSTM extends LSTM with topic-context features to improve word prediction, next-sentence selection, and sentence-topic prediction on Wikipedia and Google News data, showing significant relative gains over strong LSTM baselines.
Documents exhibit sequential structure at multiple levels of abstraction (e.g., sentences, paragraphs, sections). These abstractions constitute a natural hierarchy for representing the context in which to infer the meaning of words and larger fragments of text. In this paper, we present CLSTM (Contextual LSTM), an extension of the recurrent neural network LSTM (Long-Short Term Memory) model, where we incorporate contextual features (e.g., topics) into the model. We evaluate CLSTM on three specific NLP tasks: word prediction, next sentence selection, and sentence topic prediction. Results from experiments run on two corpora, English documents in Wikipedia and a subset of articles from a recent snapshot of English Google News, indicate that using both words and topics as features improves performance of the CLSTM models over baseline LSTM models for these tasks. For example on the next sentence selection task, we get relative accuracy improvements of 21% for the Wikipedia dataset and 18% for the Google News dataset. This clearly demonstrates the significant benefit of using context appropriately in natural language (NL) tasks. This has implications for a wide variety of NL applications like question answering, sentence completion, paraphrase generation, and next utterance prediction in dialog systems.
Motivation & Objective
- Motivate and model long-range contextual information in documents using topic-based signals to improve language modeling.
- Develop a CLSTM architecture that injects topic embeddings into LSTM gates.
- Evaluate CLSTM on word prediction, next sentence selection, and sentence topic prediction across large-scale corpora (Wikipedia and Google News).
- Analyze the impact of hierarchical (sentence and paragraph) topics and unsupervised thought signals on performance.
Proposed method
- Modify LSTM cell equations to incorporate a topic vector T into input, forget, cell, and output gates (concatenating word embeddings with topic embeddings).
- Use HTM hierarchical topic modeling to supply topic distributions over segments (PrevSent, SentSeg, ParaSeg).
- Train models on large-scale corpora (Wikipedia 129K vocab, Google News 100K vocab) and compare to word-only LSTM baselines.
- Assess word prediction perplexity, next-sentence scoring accuracy, and sentence-topic perplexity under various feature combinations.
- Experiment with unsupervised thought vectors (PrevSentThought) as alternatives to supervised topics.
- Report results on both Wikipedia and Google News datasets with 1024 hidden units as a reference.
- Provide analysis of error types and discuss potential extensions like hierarchical LSTMs (HLSTM).
Experimental results
Research questions
- RQ1Does incorporating topic-context via CLSTM improve word prediction perplexity over strong LSTM baselines?
- RQ2Can CLSTM improve next sentence selection accuracy compared to LSTM when given sentence-level and paragraph-level topic signals?
- RQ3Is sentence topic prediction more accurate when using word+topic features rather than words or topics alone?
- RQ4How do different topic signal variants (PrevSentTopic, SentSegTopic, ParaSegTopic) affect performance?
- RQ5Do unsupervised thought vectors provide a viable alternative to supervised topic signals in CLSTM?
Key findings
- CLSTM with Word + SentSegTopic + ParaSegTopic achieves best perplexity in word prediction across Wikipedia and Google News.
- Word prediction perplexity improves when adding sentence- and paragraph-level topics, with diminishing returns beyond 1024 hidden units.
- Next sentence scoring: LSTM accuracy 52% vs CLSTM 63% (Wikipedia test dataset), a 21% relative improvement.
- Sentence topic prediction: CLSTM with Word+SentTopic outperforms the baseline SentTopic by over 12% in perplexity.
- CLSTM with thought vectors (PrevSentThought) improves perplexity versus word-only models, but supervised topic signals can yield larger gains.
- On Google News, CLSTM shows ~18% improvement in next sentence selection accuracy over LSTM, and ~9% improvement in topic prediction tasks.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.