QUICK REVIEW

[論文レビュー] TimeLMs: Diachronic Language Models from Twitter

Daniel Loureiro, Francesco Barbieri|arXiv (Cornell University)|Feb 8, 2022

Data Stream Mining Techniques被引用数 26

ひとこと要約

TimeLMsは、Twitterデータで訓練された時間依存のRoBERTa-baseベースの言語モデルを3か月ごとにアップデート・リリースし、将来データでの古いモデルの性能低下と継続的更新の利点を示します。

ABSTRACT

Despite its importance, the time variable has been largely neglected in the NLP and language model literature. In this paper, we present TimeLMs, a set of language models specialized on diachronic Twitter data. We show that a continual learning strategy contributes to enhancing Twitter-based language models' capacity to deal with future and out-of-distribution tweets, while making them competitive with standardized and more monolithic benchmarks. We also perform a number of qualitative analyses showing how they cope with trends and peaks in activity involving specific named entities or concept drift.

研究の動機と目的

Motivate the need for diachronic, time-aware language models in fast-changing social media like Twitter.
Show that continual, quarterly updating improves performance on future/out-of-distribution tweets.
Provide a practical framework and tooling for time-aware evaluation and usage of TimeLMs.

提案手法

Build a base RoBERTa-base model trained on 2018-2019 Twitter data (2019-90M).
Continually train updated models every three months using newly collected Twitter data.
Clean data by removing top-1% most active users, removing duplicates/near-duplicates, and anonymizing mentions (except verified users).
Evaluate models with TweetEval benchmark and pseudo-perplexity on time-sliced test sets.
Provide a Python interface to compute pseudo-perplexity and masked predictions across time-specific models.

実験結果

リサーチクエスチョン

RQ1Do time-specific language models better handle diachronic shifts in Twitter data compared to static baselines?
RQ2How does continual quarterly updating influence performance on newer versus older time periods?
RQ3To what extent does increased data size versus recency drive improvements in time-aware LMs?
RQ4Can a practical tooling interface enable easy time-aware evaluation and usage of TimeLMs?

主な発見

Models	Emoji	Emotion	Hate	Irony	Offensive	Sentiment	Stance	ALL
SVM	29.3	64.7	36.7	61.7	52.3	62.9	67.3	53.5
FastText	25.8	65.2	50.6	63.1	73.4	62.9	65.4	58.1
BLSTM	24.7	66.0	52.6	62.8	71.7	58.3	59.4	56.5
RoBERTa-Base	30.8	76.6	44.9	55.2	78.7	72.0	70.9	61.3
TweetEval	31.6	79.8	55.5	62.5	81.6	72.9	72.6	65.2
BERTweet	33.4	79.3	56.4	82.1	79.5	73.4	71.2	67.9
TimeLM-19	33.4	81.0	58.1	48.0	82.4	73.2	70.7	63.8
TimeLM-21	34.0	80.2	55.1	64.5	82.2	73.7	72.9	66.2

Time-aware models show competitive performance on TweetEval tasks compared to baselines and BERTweet, with TimeLM-21 performing well across tasks.
Pseudo-perplexity results indicate newer models generally outperform older ones on contemporaneous test data, reflecting reduced degradation over time.
quarterly updates reduce degradation over time, though older periods benefit from larger cumulative data in some settings.
A control experiment suggests that increasing training data size improves performance, while recency primarily benefits more recent test sets.
Qualitative examples show time-specific models better predict period-relevant masked tokens (e.g., COVID era, Squid Game) than older models.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。