QUICK REVIEW

[论文解读] Towards Automated Factchecking: Developing an Annotation Schema and Benchmark for Consistent Automated Claim Detection

Lev Konstantinovskiy, Oliver R. Price|arXiv (Cornell University)|Sep 21, 2018

Spam and Phishing Detection参考文献 43被引用 44

一句话总结

论文开发了一个7类注释体系用于声明检测，通过众包标注了5571句数据集，并提出了一个普遍句子表示的分类器（CNC），达到0.83的F1，优于ClaimBuster。

ABSTRACT

In an effort to assist factcheckers in the process of factchecking, we tackle the claim detection task, one of the necessary stages prior to determining the veracity of a claim. It consists of identifying the set of sentences, out of a long text, deemed capable of being factchecked. This paper is a collaborative work between Full Fact, an independent factchecking charity, and academic partners. Leveraging the expertise of professional factcheckers, we develop an annotation schema and a benchmark for automated claim detection that is more consistent across time, topics and annotators than previous approaches. Our annotation schema has been used to crowdsource the annotation of a dataset with sentences from UK political TV shows. We introduce an approach based on universal sentence representations to perform the classification, achieving an F1 score of 0.83, with over 5% relative improvement over the state-of-the-art methods ClaimBuster and ClaimRank. The system was deployed in production and received positive user feedback.

研究动机与目标

开发一个客观、时间一致的声明检测标注方案，尽量减少个人偏见。
通过该方案对英国政治电视节目中的大量句子进行众包标注数据集。
开发并评估基于普遍句子表示的声明检测系统。
将所提出的方法与现有最先进的声明检测系统进行基准比较。
提供一个可投入生产的标注框架和数据集，为未来工作提供指导。

提出的方法

迭代开发一个7类注释体系并以二元映射映射到claim vs not-a-claim。
从80名志愿者那里众包标注5,571句来自英国政治电视字幕的句子（4个电视节目，14集）。
使用InferSent普遍句子表示来编码句子，必要时使用POS/NER计数进行增强，并训练有监督的分类器。
使用标准指标（精确率、召回率、F1）并进行分层5折交叉验证，与ClaimBuster和ClaimRank等基线进行比较。
评估二元claim/not-claim以及跨7类的多类扩展。
报告生产部署考虑因素和标注者之间一致性分析。

实验结果

研究问题

RQ1在各主题和标注者之间，什么样的定义能够客观且可检验地构成一个声明？
RQ2一个7类注释体系在简化为二元分类后，是否能为声明检测提供一致的标注？
RQ3普遍句子表示在声明检测中是否优于以往基于特征的方法？
RQ4所提出的CNC模型在F1分数上与ClaimBuster和ClaimRank等最先进系统相比如何？
RQ5众包标注在声明检测中的一致性与可靠性是否有证据？

主要发现

开发了一个包含7个类别的注释体系，并对5571句进行众包标注。
将二元映射到claim/not-claim可以提高标注者间的一致性并实现高F1性能。
使用逻辑回归和普遍句子表示的CNC模型实现F1=0.83，优于ClaimBuster（F1=0.79）约5%以上相对提升。
基于GloVe的嵌入表现具有竞争力，但使用POS/NER增强的CNC得到类似或更好的结果。
ClaimRank在精确度上更高但召回率较低，CNC提供了更有利的平衡（F1=0.83）。
该方法在与基线的比较中显示出显著的性能提升，并提供一个可投入生产的基准数据集以供未来工作使用。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。