Skip to main content
QUICK REVIEW

[Paper Review] Character-level Convolutional Networks for Text Classification

Xiang Zhang, Junbo Zhao|arXiv (Cornell University)|Sep 4, 2015
Topic Modeling30 references3,267 citations
TL;DR

This paper empirically evaluates character-level ConvNets for text classification, showing competitive or state-of-the-art results on large-scale datasets without relying on word-level representations.

ABSTRACT

This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.

Motivation & Objective

  • Motivate text classification with character-level signals rather than word-level representations.
  • Demonstrate that deep character-level ConvNets can achieve competitive or state-of-the-art results on large-scale datasets.
  • Compare character-level ConvNets with traditional models and word-based deep learning approaches across diverse tasks.
  • Investigate the impact of dataset size, alphabet choice, and data augmentation on model performance.

Proposed method

  • Use two 9-layer character-level ConvNets (large and small) operating on 70-character alphabet inputs.
  • Apply 1-D temporal convolutions with multiple kernel sizes and pooling, followed by fully-connected layers and dropout.
  • Train with SGD with momentum, specific learning-rate schedule, and Torch7 implementations.
  • Quantize input as one-hot character vectors with fixed length, prioritizing recent characters in the sequence.
  • Augment data via thesaurus-based synonym replacement to improve generalization.
  • Compare against Bag-of-Words/TFIDF, Bag-of-N-grams, Bag-of-means, LSTM, and word-based ConvNets with and without pretrained embeddings.

Experimental results

Research questions

  • RQ1Can character-level ConvNets achieve competitive performance for text classification without word-level tokens?
  • RQ2How do character-level models compare to traditional and word-level deep learning approaches across large-scale datasets?
  • RQ3What are the effects of dataset size, alphabet choice, and data augmentation on model performance?
  • RQ4Are character-level ConvNets more robust on user-generated, less curated text?
  • RQ5Does distinguishing uppercase vs lowercase letters help or hurt performance on large-scale data?

Key findings

  • Character-level ConvNets can be effective for text classification without relying on words.
  • Larger, less curated, million-scale datasets tend to favor character-level ConvNets over traditional methods.
  • Thesaurus-based data augmentation improves performance for character-level models.
  • Alphabet choice (case sensitivity) often harms performance when using large datasets; not distinguishing case can regularize.
  • Word-based deep models may still outperform on smaller datasets, but character-level ConvNets surpass them on very large datasets.
  • The best results emerge on large-scale datasets where character-level ConvNets outperform several baselines.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.