QUICK REVIEW

[論文レビュー] Are you in a Masquerade? Exploring the Behavior and Impact of Large Language Model Driven Social Bots in Online Social Networks

Siyu Li, Jin Yang|arXiv (Cornell University)|Jul 19, 2023

FinTech, Crowdfunding, Digital Finance被引用数 8

ひとこと要約

The paper analyzes LLM-driven social bots on Chirper (a Twitter-like OSN), detailing their macro-level behaviors, toxic content impact, and challenges to detection, and releases Masquerade-23 dataset.

ABSTRACT

As the capabilities of Large Language Models (LLMs) emerge, they not only assist in accomplishing traditional tasks within more efficient paradigms but also stimulate the evolution of social bots. Researchers have begun exploring the implementation of LLMs as the driving core of social bots, enabling more efficient and user-friendly completion of tasks like profile completion, social behavior decision-making, and social content generation. However, there is currently a lack of systematic research on the behavioral characteristics of LLMs-driven social bots and their impact on social networks. We have curated data from Chirper, a Twitter-like social network populated by LLMs-driven social bots and embarked on an exploratory study. Our findings indicate that: (1) LLMs-driven social bots possess enhanced individual-level camouflage while exhibiting certain collective characteristics; (2) these bots have the ability to exert influence on online communities through toxic behaviors; (3) existing detection methods are applicable to the activity environment of LLMs-driven social bots but may be subject to certain limitations in effectiveness. Moreover, we have organized the data collected in our study into the Masquerade-23 dataset, which we have publicly released, thus addressing the data void in the subfield of LLMs-driven social bots behavior datasets. Our research outcomes provide primary insights for the research and governance of LLMs-driven social bots within the research community.

研究の動機と目的

Characterize macro-level and content-level behaviors of LLM-driven social bots on Chirper.
Assess the presence and characteristics of toxic behaviors in LLM-driven bot content.
Evaluate the effectiveness of existing social bot detection methods on LLM-driven bots.
Provide a publicly available dataset (Masquerade-23) to support future research.

提案手法

Platform-wide scraping of Chirper to collect 544.6K tweets and 36.7K accounts (SDPS).
Deep user scraping to gather metadata, full historical tweets, and 2.4M activity records from 32.2K accounts (SDAR).
Cosine similarity of historical tweets to measure content consistency per account.
Topic modeling with BERTopic to tag tweet themes and compare with self-descriptions.
Toxicity assessment using Perspective API and a Chinese toxicity model (COLD) for cross-language evaluation.
Ethical considerations including passive data collection and public data release.

実験結果

リサーチクエスチョン

RQ1RQ1: What are the macro-level characteristics of LLMs-driven social bots and how do they differ from authentic and traditional bot accounts?
RQ2RQ2: Do LLMs-driven bots engage in toxic behaviors, and what are the characteristics of such toxic content?
RQ3RQ3: Do current social bot detection methods remain effective against LLMs-driven social bots?

主な発見

Sub-channel (Language)	Tweet Num. (SDPS)	Account Num. (SDPS)	Tweet Num. (SDAR)	Account Num. (SDAR)	Action Num. (SDAR)
EN	356,395	23,399	1,047,998	20,814	272,150
ZH	187,391	13,228	694,368	11,288	224,282
JP	628	87	82,824	82	11,241
DE	96	11	5,442	11	849
SP	109	37	37,142	37	4,255
Total	544,619	36,762	1,867,774	32,232	512,777

LLMs-driven social bots show higher content similarity than human and traditional bot accounts on average (0.453 vs. 0.321 and 0.305).
A substantial portion of their tweets overlap keywords with account self-descriptions (58.36%), higher than humans (10.15%) and traditional bots (33.28%).
LLMs-driven bots exhibit toxic content, though over 97% of content is non-toxic at a 0.5 threshold; a small subset of accounts produces disproportionately toxic content.
Toxic behavior categories include trolling (13.7%), threat-related content (21.4%), sexual harassment (38.5%), and identity hate (6.3%), with some highly toxic accounts driving overall toxicity.
Existing feature-based bot detection methods show a modest drop in performance when applied to LLM-driven bots (F1-scores reduced by 0.0490 and 0.0278) compared to traditional bots; structure-based methods were not evaluated here.
Masquerade-23 dataset released publicly, comprising 32.2K bot accounts and 2.4M activity records.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。