QUICK REVIEW

[Paper Review] Character-level Convolutional Networks for Text Classification

Xiang Zhang, Junbo Zhao|arXiv (Cornell University)|Sep 4, 2015

Topic Modeling30 references3,267 citations

TL;DR

This paper empirically evaluates character-level ConvNets for text classification, showing competitive or state-of-the-art results on large-scale datasets without relying on word-level representations.

ABSTRACT

This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.

Motivation & Objective

Motivate text classification with character-level signals rather than word-level representations.
Demonstrate that deep character-level ConvNets can achieve competitive or state-of-the-art results on large-scale datasets.
Compare character-level ConvNets with traditional models and word-based deep learning approaches across diverse tasks.
Investigate the impact of dataset size, alphabet choice, and data augmentation on model performance.

Proposed method

Use two 9-layer character-level ConvNets (large and small) operating on 70-character alphabet inputs.
Apply 1-D temporal convolutions with multiple kernel sizes and pooling, followed by fully-connected layers and dropout.
Train with SGD with momentum, specific learning-rate schedule, and Torch7 implementations.
Quantize input as one-hot character vectors with fixed length, prioritizing recent characters in the sequence.
Augment data via thesaurus-based synonym replacement to improve generalization.
Compare against Bag-of-Words/TFIDF, Bag-of-N-grams, Bag-of-means, LSTM, and word-based ConvNets with and without pretrained embeddings.

Experimental results

Research questions

RQ1Can character-level ConvNets achieve competitive performance for text classification without word-level tokens?
RQ2How do character-level models compare to traditional and word-level deep learning approaches across large-scale datasets?
RQ3What are the effects of dataset size, alphabet choice, and data augmentation on model performance?
RQ4Are character-level ConvNets more robust on user-generated, less curated text?
RQ5Does distinguishing uppercase vs lowercase letters help or hurt performance on large-scale data?

Key findings

Character-level ConvNets can be effective for text classification without relying on words.
Larger, less curated, million-scale datasets tend to favor character-level ConvNets over traditional methods.
Thesaurus-based data augmentation improves performance for character-level models.
Alphabet choice (case sensitivity) often harms performance when using large datasets; not distinguishing case can regularize.
Word-based deep models may still outperform on smaller datasets, but character-level ConvNets surpass them on very large datasets.
The best results emerge on large-scale datasets where character-level ConvNets outperform several baselines.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.