QUICK REVIEW

[论文解读] All-in-one: Multi-task Learning for Rumour Verification

Elena Kochkina, Maria Liakata|arXiv (Cornell University)|Jun 10, 2018

Misinformation and Its Impacts参考文献 32被引用 102

一句话总结

这篇论文提出一个多任务学习框架，联合训练 veracity classification 与辅助任务（rumour detection 和 stance classification），以提升在 RumourEval 和 PHEME 数据集上的谣言验证性能，并分析影响多任务收益的数据属性。

ABSTRACT

Automatic resolution of rumours is a challenging task that can be broken down into smaller components that make up a pipeline, including rumour detection, rumour tracking and stance classification, leading to the final outcome of determining the veracity of a rumour. In previous work, these steps in the process of rumour verification have been developed as separate components where the output of one feeds into the next. We propose a multi-task learning approach that allows joint training of the main and auxiliary tasks, improving the performance of rumour verification. We examine the connection between the dataset properties and the outcomes of the multi-task learning models used.

研究动机与目标

将谣言分辨设定为一个多任务学习问题，其中 veracity 为主任务，辅助任务可提升性能。
调查在 veracity 与 stance 和/或 detection 共同训练时对验证准确性和 macro-F 分数的影响。
评估数据集属性（entropy、kurtosis、token-type ratio）与多任务学习收益的关系。
将多任务模型与强基线比较，包括一个最先进的 veracity classifier 和 majority baselines。
探讨使用不同数据集分割（RumourEval 和 PHEME 使用 leave-one-event-out）对模型性能的影响。

提出的方法

使用分支 LSTM 的顺序式结构来将谣言建模为 tweet 分支。
在多任务设置中采用硬参数共享，并为 veracity、stance、detection 设置任务特定的输出层。
使用一个将任务损失相加的联合损失进行训练；在给定实例中对未标注的任务跳过损失。
使用准确率和 macro 平均 F1 进行评估，macro-F 作为处理不平衡数据的主要指标，并在 PHEME 上执行 leave-one-event-out 交叉验证。

实验结果

研究问题

RQ1将 veracity 与 stance 和/或 detection 相结合的多任务学习是否优于单任务学习中的 veracity 分类？
RQ2哪种辅助任务配置（stance、detection，或两者）能给 veracity 性能带来最佳提升？
RQ3数据集属性如何影响谣言验证中的多任务学习效果？
RQ4在 RumourEval 和不同的 PHEME 事件划分（5 vs 9 事件）上，性能有何差异？

主要发现

多任务模型在 PHEME 和 RumourEval 数据集上始终优于单任务 veracity 分类器。
三任务设置（veracity、stance、detection）相对于单任务基线带来最强提升。
MTL2（Veracity+Stance 或 Veracity+Detection）优于单任务 branchLSTM，MTL3（全部三个任务）提供进一步提升。
结果与先前工作一致，表明数据集属性（entropy、kurtosis）影响多任务收益，特别是当辅助任务的 kurtosis 低于主任务时。
在 RumourEval 上，多任务学习超过 NileTMRG* 与 branchLSTM 基线；在 PHEME 上，MTL3 在所测试配置中实现了最佳总体 macro-F 与 accuracy。
PHEME 的事件间性能差异，Ferguson 事件特别具有挑战性，且在逐类预测（true/false/unverified）中存在差异。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。