QUICK REVIEW

[論文レビュー] SemEval-2019 Task 6: Identifying and Categorizing Offensive Language in Social Media (OffensEval)

Marcos Zampieri, Shervin Malmasi|arXiv (Cornell University)|Mar 19, 2019

Hate Speech and Cyberbullying Detection参考文献 73被引用数 37

ひとこと要約

この論文は OffensEval 共有タスク（SemEval-2019 Task 6）を OLID データセットを用いて、英語のツイートにおける offense の識別、 offense タイプの分類、および offense のターゲットの特定を行い、BERT ベースおよびアンサンブル手法がトップの結果を達成している。

ABSTRACT

We present the results and the main findings of SemEval-2019 Task 6 on Identifying and Categorizing Offensive Language in Social Media (OffensEval). The task was based on a new dataset, the Offensive Language Identification Dataset (OLID), which contains over 14,000 English tweets. It featured three sub-tasks. In sub-task A, the goal was to discriminate between offensive and non-offensive posts. In sub-task B, the focus was on the type of offensive content in the post. Finally, in sub-task C, systems had to detect the target of the offensive posts. OffensEval attracted a large number of participants and it was one of the most popular tasks in SemEval-2019. In total, about 800 teams signed up to participate in the task, and 115 of them submitted results, which we present and analyze in this report.

研究の動機と目的

攻撃的言語の自動検出を動機づけ、手動でのモデレーション負担を軽減する。
OLID の導入、攻撃の有無・タイプ・ターゲットを捉える階層的3レベルのアノテーションスキーマ。
3 つのサブタスクを定義（A: offensive vs not; B: offense type; C: offense target）を別々に研究する。
英語ツイートにおける攻撃的言語識別のベンチマークを確立するためのベースラインと競合的な結果を提供する。

提案手法

3 レベルの階層的アノテーションスキームを持つ OLID データセットを使用。
クラス不均衡のため公式指標として macro F1 を用いて3つのサブタスクを評価。
従来の機械学習（SVM）からディープラーニング（CNN、RNN、BiLSTM、transformers）およびアンサンブルまで、広範なモデルを調査。
外部データセットと事前学習埋め込み（FastText、GloVe、Twitter embeddings）を取り入れ、ハッシュタグ・トークン・絵文字などのツイート特有の前処理を適用。
結果とトップシステムを報告し、サブタスクAではBERTベースのモデルが多いこと、サブタスクBおよびCではアンサンブルが多いことを強調。

実験結果

リサーチクエスチョン

RQ1階層的アノテーションスキーマはソーシャルメディアテキストにおける offense の有無、タイプ、およびターゲットを効果的に捉えることができるか？
RQ2OLID の各サブタスクにおいて最も効果的なモデリング手法（例：BERT、アンサンブル）は何か？
RQ3攻撃的 vs 非攻撃的、攻撃タイプ、および攻撃ターゲットでモデルの性能はどう変化するか。
RQ4外部データの活用と前処理手法が OffensEval の性能をどの程度向上させるか？

主な発見

約800チームが登録し、サブタスク全体で115件の提出があった。
サブタスクAのトップ（ offensive language identification ）は 82.9% F1 を達成（NULI with BERT-base-uncased）。
サブタスクB はアンサンブルとBERTで高い性能を示し、トップチームはケースによって 75.5% F1（jhan014、キーワードによるルールベース）を達成。
サブタスクC のトップは 0.660 F1（vradivchev_anikolov の BERT ベースのアプローチ）。
深層学習とアンサンブル手法が支配的で、伝統的なML も存在した。事前学習埋め込みとツイート特有の前処理が一般的に使用された。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。