QUICK REVIEW

[論文レビュー] A Survey on Contextualised Semantic Shift Detection

Stefano Montanelli, Francesco Periti|arXiv (Cornell University)|Apr 4, 2023

Language and cultural evolution参考文献 104被引用数 16

ひとこと要約

本論文は文脈化された意味変化検出（CSSDetection）アプローチを概観し、意味表現・時間認知・学習モダリティの三次元分類フレームワークを提案し、評価指標・データセット・未解決課題を分析する。

ABSTRACT

Semantic Shift Detection (SSD) is the task of identifying, interpreting, and assessing the possible change over time in the meanings of a target word. Traditionally, SSD has been addressed by linguists and social scientists through manual and time-consuming activities. In the recent years, computational approaches based on Natural Language Processing and word embeddings gained increasing attention to automate SSD as much as possible. In particular, over the past three years, significant advancements have been made almost exclusively based on word contextualised embedding models, which can handle the multiple usages/meanings of the words and better capture the related semantic shifts. In this paper, we survey the approaches based on contextualised embeddings for SSD (i.e., CSSDetection) and we propose a classification framework characterised by meaning representation, time-awareness, and learning modality dimensions. The framework is exploited i) to review the measures for shift assessment, ii) to compare the approaches on performance, and iii) to discuss the current issues in terms of scalability, interpretability, and robustness. Open challenges and future research directions about CSSDetection are finally outlined.

研究の動機と目的

CSSDetectionを定義し、時間の経過に伴う意味変化分析を自動化することの重要性。
CSSDetectionアプローチの三次元分類フレームワークを提案する（意味表現、時間認知、学習モダリティ）。
最先端のCSSDetection手法とその評価方法を概観する。
共有タスクとコーパスを用いた比較を可能な限り行う。
スケーラビリティ、解釈性、頑健性の課題を特定し、今後の研究方向を概説する。

提案手法

CSSDetectionの正式なワークフローを導入する：埋め込み、任意の集計、シフト評価。
3つの次元に沿ってアプローチを分類する：意味表現（形式ベース vs. センスベース）、時間認知（時間無関心 vs. 時間意識）、学習モダリティ（教師あり vs. 教師なし）。
意味シフト指標を記述・形式化する（例：プロトタイプ間のコサイン距離、プロトタイプ間の反対類似度、タイム・ディフ、平均対距離）。
集計技術（クラスタリング vs. 平均化）とシフト測定への影響を議論する。
形式ベースおよびセンスベースのCSSDetection手法のカタログを、モデルタイプ、トレーニング regimes、シフト関数とともに提供する。
共有タスクの結果を要約する（例：SemEval-20 Task 1、DIACRIta-20、RuShiftEval-21、LSCDiscovery-22）し、利用可能な場合には報告性能を比較する。

実験結果

リサーチクエスチョン

RQ1CSSDetectionアプローチを体系的に分類・比較するにはどうすればよいか。
RQ2CSSDetectionで用いられる意味表現と時間認知戦略は何であり、それらは検出と解釈性にどのように影響するか。
RQ3CSSDetectionで採用される学習パラダイム（教師あり vs. 教師なし）は何であり、外部知識は利用されるか、回避されるか。
RQ4意味シフト指標は何が用いられ、タスクや言語間でどのように性能を発揮するか。
RQ5現在のスケーラビリティ、解釈性、頑健性の制約は何であり、今後の方向性は何が示唆されているか。

主な発見

ほとんどの形式ベースのCSSDetection手法は時間依存性が低く、教師なし学習に依拠しており、平均化が一般的な集計戦略である。
センスベースのアプローチは、クラスタリングを用いて複数の語用法や意味を捉え、意味間のシフトの解釈を可能にする。
プロトタイプ間のコサイン距離（CD）は広く用いられるシフト関数であり、反対類似度（PRT）や時間依存の変種（TD、APD）などの代替案が議論されている。
時間依存アプローチは通常、時間マーカーや時間的参照を用いて事前学習モデルを微調整または適応させ、時系列ダイナミクスを捉える。
共有タスク評価（例：SemEval-20、DIACRIta-20、RuShiftEval-21、LSCDiscovery-22）はCSSDetection手法を比較するために用いられるが、結果は課題の特性と言語の制限を受ける。
本サーベイはスケーラビリティ、解釈性、頑健性の未解決課題を強調し、CSSDetectionの今後の研究方向性を概説する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。