Skip to main content
QUICK REVIEW

[論文レビュー] TimeLMs: Diachronic Language Models from Twitter

Daniel Loureiro, Francesco Barbieri|arXiv (Cornell University)|Feb 8, 2022
Data Stream Mining Techniques被引用数 26
ひとこと要約

TimeLMsは、Twitterデータで訓練された時間依存のRoBERTa-baseベースの言語モデルを3か月ごとにアップデート・リリースし、将来データでの古いモデルの性能低下と継続的更新の利点を示します。

ABSTRACT

Despite its importance, the time variable has been largely neglected in the NLP and language model literature. In this paper, we present TimeLMs, a set of language models specialized on diachronic Twitter data. We show that a continual learning strategy contributes to enhancing Twitter-based language models' capacity to deal with future and out-of-distribution tweets, while making them competitive with standardized and more monolithic benchmarks. We also perform a number of qualitative analyses showing how they cope with trends and peaks in activity involving specific named entities or concept drift.

研究の動機と目的

  • Motivate the need for diachronic, time-aware language models in fast-changing social media like Twitter.
  • Show that continual, quarterly updating improves performance on future/out-of-distribution tweets.
  • Provide a practical framework and tooling for time-aware evaluation and usage of TimeLMs.

提案手法

  • Build a base RoBERTa-base model trained on 2018-2019 Twitter data (2019-90M).
  • Continually train updated models every three months using newly collected Twitter data.
  • Clean data by removing top-1% most active users, removing duplicates/near-duplicates, and anonymizing mentions (except verified users).
  • Evaluate models with TweetEval benchmark and pseudo-perplexity on time-sliced test sets.
  • Provide a Python interface to compute pseudo-perplexity and masked predictions across time-specific models.

実験結果

リサーチクエスチョン

  • RQ1Do time-specific language models better handle diachronic shifts in Twitter data compared to static baselines?
  • RQ2How does continual quarterly updating influence performance on newer versus older time periods?
  • RQ3To what extent does increased data size versus recency drive improvements in time-aware LMs?
  • RQ4Can a practical tooling interface enable easy time-aware evaluation and usage of TimeLMs?

主な発見

ModelsEmojiEmotionHateIronyOffensiveSentimentStanceALL
SVM29.364.736.761.752.362.967.353.5
FastText25.865.250.663.173.462.965.458.1
BLSTM24.766.052.662.871.758.359.456.5
RoBERTa-Base30.876.644.955.278.772.070.961.3
TweetEval31.679.855.562.581.672.972.665.2
BERTweet33.479.356.482.179.573.471.267.9
TimeLM-1933.481.058.148.082.473.270.763.8
TimeLM-2134.080.255.164.582.273.772.966.2
  • Time-aware models show competitive performance on TweetEval tasks compared to baselines and BERTweet, with TimeLM-21 performing well across tasks.
  • Pseudo-perplexity results indicate newer models generally outperform older ones on contemporaneous test data, reflecting reduced degradation over time.
  • quarterly updates reduce degradation over time, though older periods benefit from larger cumulative data in some settings.
  • A control experiment suggests that increasing training data size improves performance, while recency primarily benefits more recent test sets.
  • Qualitative examples show time-specific models better predict period-relevant masked tokens (e.g., COVID era, Squid Game) than older models.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。