QUICK REVIEW

[论文解读] Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment

Masatoshi Tsuchiya|arXiv (Cornell University)|Apr 22, 2018

Topic Modeling参考文献 21被引用 123

一句话总结

本文提出一种两阶段方法，用于在 RTE 语料库中检测隐藏偏见，使用 Naive Bayes TE-label 预测器和基线，对 SNLI 发现隐藏偏见而在 SICK 未发现，并显示该偏见可能扭曲神经 NLP 模型在 RTE 任务上的表现。

ABSTRACT

The quality of training data is one of the crucial problems when a learning-centered approach is employed. This paper proposes a new method to investigate the quality of a large corpus designed for the recognizing textual entailment (RTE) task. The proposed method, which is inspired by a statistical hypothesis test, consists of two phases: the first phase is to introduce the predictability of textual entailment labels as a null hypothesis which is extremely unacceptable if a target corpus has no hidden bias, and the second phase is to test the null hypothesis using a Naive Bayes model. The experimental result of the Stanford Natural Language Inference (SNLI) corpus does not reject the null hypothesis. Therefore, it indicates that the SNLI corpus has a hidden bias which allows prediction of textual entailment labels from hypothesis sentences even if no context information is given by a premise sentence. This paper also presents the performance impact of NN models for RTE caused by this hidden bias.

研究动机与目标

评估大型 RTE 语料库的质量。
提出一个关于 TE-label 可预测性的无上下文的原假设。
开发一个 Naive Bayes TE-label 预测模型。
比较 SNLI 和 SICK 语料库以揭示隐藏偏见。
讨论隐藏偏见对 RTE 的神经网络模型的影响。

提出的方法

将在没有前提的情况下的 TE-label 可预测性定义为原假设。
使用带有 unigram 特征的多项式 Naive Bayes 模型对假设句子预测 TE 标签。
使用一个基线模型，在缺少前提和假设上下文时分配语料库中最频繁的 TE 标签。
通过比较 TE-label 预测模型与基线的符号检验来检验原假设。
将该方法应用于 SNLI 和 SICK 语料库以评估隐藏偏见。
讨论对基于 NN 的 RTE 模型的影响以及偏见如何伪装成学习信号。

实验结果

研究问题

RQ1RTE 语料库是否包含隐藏偏见，允许在没有前提的情况下预测 TE 标签？
RQ2NB TE-label 预测模型是否能在假设仅数据上超越基于语料库多数基线？
RQ3隐藏偏见在 SNLI 中是否存在，还是在 SICK 中不存在？
RQ4检测到的偏见如何影响 NN 模型在 RTE 上的评估和学习行为？

主要发现

TE 标签预测模型在 SNLI 的无前提假设句子上达到 63.3% 的准确率，而基线为 34.3%。
对于 SICK，TE 标签预测器和基线表现相近（56.7%）。
SNLI 模型之间的差异具有统计显著性（p = 5.7e−202）。
SNLI 的隐藏偏见使在无上下文的情况下可预测TE标签，而 SICK 未显示此偏见（SICK 的原假设被拒绝，SNLI 则不是）。
RTE 的 NN 模型在经验难测试集上表现明显下降，表明依赖偏见而非真正上下文理解。
将前提词替换为未知标记会降低上下文信息，但 NN 模型在经验容易测试集上仍然高于随机水平，表明是 TE-标签预测而非真正的 RTE 行为。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。