QUICK REVIEW

[论文解读] A Stylometric Inquiry into Hyperpartisan and Fake News

Martin Potthast, Johannes Kiesel|arXiv (Cornell University)|Feb 18, 2017

Authorship Attribution and Profiling参考文献 19被引用 61

一句话总结

这篇论文分析写作风格以区分极端党派新闻与主流新闻和讽刺，并评估通过风格使用 Unmasking 在 BuzzFeed-Webis 语料库上的假新闻检测；研究结果显示极端党派风格与主流有可区分性，左翼/右翼风格存在风格上的相似性，但仅靠风格在假新闻检测上存在困难。

ABSTRACT

This paper reports on a writing style analysis of hyperpartisan (i.e., extremely one-sided) news in connection to fake news. It presents a large corpus of 1,627 articles that were manually fact-checked by professional journalists from BuzzFeed. The articles originated from 9 well-known political publishers, 3 each from the mainstream, the hyperpartisan left-wing, and the hyperpartisan right-wing. In sum, the corpus contains 299 fake news, 97% of which originated from hyperpartisan publishers. We propose and demonstrate a new way of assessing style similarity between text categories via Unmasking---a meta-learning approach originally devised for authorship verification---, revealing that the style of left-wing and right-wing news have a lot more in common than any of the two have with the mainstream. Furthermore, we show that hyperpartisan news can be discriminated well by its style from the mainstream (F1=0.78), as can be satire from both (F1=0.81). Unsurprisingly, style-based fake news detection does not live up to scratch (F1=0.46). Nevertheless, the former results are important to implement pre-screening for fake news detectors.

研究动机与目标

研究是否可以通过写作风格将极端党派新闻与主流新闻区分开。
探索左翼与右翼新闻的写作风格是否在风格上相似。
评估是否仅使用风格特征就能检测假新闻，以及讽刺与真假新闻的关系。

提出的方法

使用 Unmasking，这是一种最初用于作者认证的元学习风格分析，比较不同取向（左、右、主流）的文章集。
提取并评估广泛的风格特征集，包括字符 n-gram、停用词、词性标注 n-gram、可读性分数、基于词典的特征，以及像引号和外部链接等领域特征。
使用特征选择来剔除不常见的特征，确保跨类别可比性。
基于风格和主题特征训练随机森林分类器，用于极端党派与主流、取向预测，以及讽刺检测。
将假新闻定义为大多为虚假或真伮混合的文章。
通过 Unmasking 斜率分析可视化风格相似性，以解释跨分类的风格亲近度。

实验结果

研究问题

RQ1极端党派的左翼与右翼新闻是否存在共同的风格模式？
RQ2仅靠写作风格就能区分极端党派新闻与主流新闻，以及讽刺新闻与真实新闻吗？
RQ3仅靠风格就能检测假新闻吗？讽刺在基于风格的检测中扮演何种角色？

主要发现

相较于主流，极端党派的左翼与右翼文章在风格上显示显著相似性，这一点由 Unmasking 曲线证实。
基于风格的分类器能够以显著的准确率和召回率将极端党派新闻与主流新闻区分开（最佳风格为极端党派对主流：准确率0.75，极端党派召回率0.89）。
基于主题的（词袋）模型在某些三类取向预测中可优于风格模型，表明主题信号在细粒度分类中很重要。
风格特征在讽刺检测上表现出色（准确率0.82，F1 0.81），且讽刺在风格上与假新闻和真实新闻均有明显区别。
仅靠风格进行假新闻检测表现一般（准确率0.55，F1约0.41–0.63，视设定而定），表明风格预筛选有帮助，但不足以单独完成。
讽刺在风格上与假新闻/真实新闻的距离更远，从风格角度能实现更可靠的区分。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。