QUICK REVIEW

[Paper Review] Efficient Character-level Document Classification by Combining Convolution and Recurrent Layers

Yijun Xiao, Kyunghyun Cho|arXiv (Cornell University)|Feb 1, 2016

Topic Modeling15 references171 citations

TL;DR

The paper introduces a hybrid ConvRec model that stacks a few convolutional layers over character embeddings and adds a bidirectional recurrent layer to efficiently capture long-range dependencies, achieving competitive accuracy with far fewer parameters than purely convolutional models.

ABSTRACT

Document classification tasks were primarily tackled at word level. Recent research that works with character-level inputs shows several benefits over word-level approaches such as natural incorporation of morphemes and better handling of rare words. We propose a neural network architecture that utilizes both convolution and recurrent layers to efficiently encode character inputs. We validate the proposed model on eight large scale document classification tasks and compare with character-level convolution-only models. It achieves comparable performances with much less parameters.

Motivation & Objective

Motivate character-level document classification to handle morphemes, rare words, and out-of-vocabulary tokens.
Propose a hybrid architecture that reduces parameter count while capturing long-range dependencies.
Demonstrate that the ConvRec model matches or exceeds convolution-only performance on large-scale datasets.
Analyze how model depth, training size, and number of classes affect performance.

Proposed method

Represent documents as sequences of characters via one-hot inputs embedded into dense vectors.
Apply multiple convolutional layers to learn local, translation-invariant features with pooling to reduce sequence length.
Use a single bidirectional recurrent layer (LSTM) on top of the convolutional features to capture long-range dependencies.
Concatenate the last states of forward and reverse recurrent layers and feed into a softmax classifier.
Train with regularized cross-entropy using AdaDelta, with dropout after the last conv layer and after the recurrent layer.

Experimental results

Research questions

RQ1Can a convolutional-recurrent hybrid architecture achieve comparable accuracy to a deeper convolutional network while using substantially fewer parameters?
RQ2How does the ConvRec model perform across diverse large-scale text classification tasks with varying numbers of classes and training sizes?
RQ3What are the effects of the number of convolutional layers and the convolutional filter size on performance?
RQ4Does the ConvRec approach maintain advantages as the number of classes grows or data size decreases?

Key findings

On eight large-scale datasets, ConvRec achieved comparable or better error rates than the best character-level convolutional model with data augmentation, while using far fewer parameters.
ConvRec often outperformed the convolution-only model as the number of classes increased (e.g., DBPedia with 14 classes).
The model tends to perform better with moderate convolution depth (two to three layers) and benefits from the recurrent layer to capture long-range dependencies.
Larger convolutional widths improved some datasets but with diminishing returns relative to parameter increase.
Two-layer to three-layer convolution stacks plus a bidirectional LSTM provide an effective balance of performance and efficiency.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.