QUICK REVIEW

[Paper Review] Twitter Opinion Topic Model: Extracting Product Opinions from Tweets by Leveraging Hashtags and Sentiment Lexicon

Kar Wai Lim, Wray Buntine|arXiv (Cornell University)|Sep 21, 2016

Sentiment Analysis and Opinion Mining44 references66 citations

TL;DR

This paper proposes the Twitter Opinion Topic Model (TOTM), an LDA-based topic model that improves aspect-based opinion mining on noisy, informal tweets by directly modeling target-opinion interactions and incorporating sentiment lexicons as learnable priors. TOTM significantly enhances opinion prediction and sentiment classification on 9 million electronic product tweets, outperforming baseline models like ILDA and LDA-DP.

ABSTRACT

Aspect-based opinion mining is widely applied to review data to aggregate or summarize opinions of a product, and the current state-of-the-art is achieved with Latent Dirichlet Allocation (LDA)-based model. Although social media data like tweets are laden with opinions, their "dirty" nature (as natural language) has discouraged researchers from applying LDA-based opinion model for product review mining. Tweets are often informal, unstructured and lacking labeled data such as categories and ratings, making it challenging for product opinion mining. In this paper, we propose an LDA-based opinion model named Twitter Opinion Topic Model (TOTM) for opinion mining and sentiment analysis. TOTM leverages hashtags, mentions, emoticons and strong sentiment words that are present in tweets in its discovery process. It improves opinion prediction by modeling the target-opinion interaction directly, thus discovering target specific opinion words, neglected in existing approaches. Moreover, we propose a new formulation of incorporating sentiment prior information into a topic model, by utilizing an existing public sentiment lexicon. This is novel in that it learns and updates with the data. We conduct experiments on 9 million tweets on electronic products, and demonstrate the improved performance of TOTM in both quantitative evaluations and qualitative analysis. We show that aspect-based opinion analysis on massive volume of tweets provides useful opinions on products.

Motivation & Objective

To address the challenge of mining product opinions from unstructured, noisy tweets lacking explicit ratings or labels.
To improve opinion prediction in tweets by modeling direct interactions between targets (e.g., 'camera', 'phone') and opinion words (e.g., 'love', 'hate').
To incorporate sentiment lexicon information into topic modeling in a data-driven, learnable way, rather than using ad hoc or rule-based methods.
To enable high-level product and brand comparisons by extracting and aggregating opinions across entities using tweet-level sentiment and hashtag-based clustering.
To demonstrate the feasibility and utility of real-time, large-scale aspect-based opinion analysis on Twitter for new product insights.

Proposed method

TOTM extends LDA by modeling target-opinion interactions directly, allowing it to learn that sentiment words like 'grilled' are positive only for specific targets (e.g., 'sausage').
It leverages hashtags, mentions, emoticons, and strong sentiment words as signals to improve topic clustering and opinion detection in short, informal text.
A novel formulation integrates a public sentiment lexicon into the topic model priors, allowing the model to learn and update sentiment strength dynamically from data.
The model uses tweet aggregation via hashtags and mentions to improve aspect clustering and enable cross-product comparisons.
A new target-opinion extraction procedure is introduced, tailored for the short, noisy format of tweets, enhancing detection accuracy.
Preprocessing includes normalization of misspellings and abbreviations, and spam filtering via URL removal to improve data quality.

Experimental results

Research questions

RQ1Can an LDA-based model effectively extract aspect-specific opinions from unstructured, informal tweets without explicit ratings?
RQ2How does direct modeling of target-opinion interactions improve opinion prediction compared to standard LDA or ILDA?
RQ3Can a sentiment lexicon be effectively and learnably integrated into a topic model to improve sentiment classification on tweets?
RQ4To what extent can TOTM enable high-level comparisons of opinions across brands (e.g., Canon, Sony, Samsung) using tweet-level sentiment and hashtag clustering?
RQ5How does TOTM perform in extracting contrasting opinions (positive vs. negative) on specific products like the iPhone?

Key findings

TOTM significantly outperforms ILDA and LDA-DP in opinion prediction, correctly identifying that sentiment words like 'grilled' are positive only for specific targets such as 'sausage'.
The proposed formulation of incorporating sentiment lexicons as learnable priors improves sentiment classification performance, outperforming ad hoc or rule-based methods.
On a dataset of 9 million tweets about electronic products, TOTM achieved better model fitting and more accurate sentiment analysis than baseline models.
TOTM enables effective brand comparison, as demonstrated by extracting and summarizing opinions on Canon, Sony, and Samsung cameras and phones using hashtag and sentiment-based clustering.
Qualitative analysis confirms that TOTM successfully extracts meaningful, contrasting opinions on products like the iPhone, including both positive and negative sentiments expressed in natural language.
The model demonstrates the feasibility of real-time, large-scale aspect-based opinion mining on Twitter, providing timely insights into new product perceptions.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.