[Paper Review] Character-level Convolutional Networks for Text Classification
This paper empirically evaluates character-level ConvNets for text classification, showing competitive or state-of-the-art results on large-scale datasets without relying on word-level representations.
This article offers an empirical exploration on the use of character-level convolutional networks (ConvNets) for text classification. We constructed several large-scale datasets to show that character-level convolutional networks could achieve state-of-the-art or competitive results. Comparisons are offered against traditional models such as bag of words, n-grams and their TFIDF variants, and deep learning models such as word-based ConvNets and recurrent neural networks.
Motivation & Objective
- Motivate text classification with character-level signals rather than word-level representations.
- Demonstrate that deep character-level ConvNets can achieve competitive or state-of-the-art results on large-scale datasets.
- Compare character-level ConvNets with traditional models and word-based deep learning approaches across diverse tasks.
- Investigate the impact of dataset size, alphabet choice, and data augmentation on model performance.
Proposed method
- Use two 9-layer character-level ConvNets (large and small) operating on 70-character alphabet inputs.
- Apply 1-D temporal convolutions with multiple kernel sizes and pooling, followed by fully-connected layers and dropout.
- Train with SGD with momentum, specific learning-rate schedule, and Torch7 implementations.
- Quantize input as one-hot character vectors with fixed length, prioritizing recent characters in the sequence.
- Augment data via thesaurus-based synonym replacement to improve generalization.
- Compare against Bag-of-Words/TFIDF, Bag-of-N-grams, Bag-of-means, LSTM, and word-based ConvNets with and without pretrained embeddings.
Experimental results
Research questions
- RQ1Can character-level ConvNets achieve competitive performance for text classification without word-level tokens?
- RQ2How do character-level models compare to traditional and word-level deep learning approaches across large-scale datasets?
- RQ3What are the effects of dataset size, alphabet choice, and data augmentation on model performance?
- RQ4Are character-level ConvNets more robust on user-generated, less curated text?
- RQ5Does distinguishing uppercase vs lowercase letters help or hurt performance on large-scale data?
Key findings
- Character-level ConvNets can be effective for text classification without relying on words.
- Larger, less curated, million-scale datasets tend to favor character-level ConvNets over traditional methods.
- Thesaurus-based data augmentation improves performance for character-level models.
- Alphabet choice (case sensitivity) often harms performance when using large datasets; not distinguishing case can regularize.
- Word-based deep models may still outperform on smaller datasets, but character-level ConvNets surpass them on very large datasets.
- The best results emerge on large-scale datasets where character-level ConvNets outperform several baselines.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.