QUICK REVIEW

[论文解读] A Comprehensive Comparison of Word Embeddings in Event & Entity Coreference Resolution.

Judicaël Poumay, Ashwin Ittoo|arXiv (Cornell University)|Nov 1, 2021

Topic Modeling参考文献 29被引用 2

一句话总结

本研究在事件共指消解和实体共指消解任务中，评估了静态、上下文相关和字符嵌入家族的词嵌入表现，采用了一套最先进的模型框架。研究发现，仅使用字符嵌入的模型可达到完整模型（使用ELMo、GloVe和字符嵌入）86%的性能，而模型大小仅为后者的1.2%；在两项任务中，ELMo的表现优于BERT和GPT-2，而GloVe和FastText在其各自家族中表现领先。

ABSTRACT

Coreference Resolution is an important NLP task and most state-of-the-art methods rely on word embeddings for word representation. However, one issue that has been largely overlooked in literature is that of comparing the performance of different embeddings across and within families in this task. Therefore, we frame our study in the context of Event and Entity Coreference Resolution (EvCR & EnCR), and address two questions : 1) Is there a trade-off between performance (predictive & run-time) and embedding size? 2) How do the embeddings' performance compare within and across families? Our experiments reveal several interesting findings. First, we observe diminishing returns in performance with respect to embedding size. E.g. a model using solely a character embedding achieves 86% of the performance of the largest model (Elmo, GloVe, Character) while being 1.2% of its size. Second, the larger model using multiple embeddings learns faster overall despite being slower per epoch. However, it is still slower at test time. Finally, Elmo performs best on both EvCR and EnCR, while GloVe and FastText perform best in EvCR and EnCR respectively.

研究动机与目标

探究共指消解中模型性能（预测性能与运行时效率）与嵌入大小之间的权衡。
比较静态、上下文相关和字符嵌入家族内部及跨家族的词嵌入预测性能。
评估更大、更具表现力的嵌入是否始终能提升性能，或是否存在更小的替代方案可达到甚至超越其表现。
为在实际应用中部署高效且高性能的共指消解系统提供实用洞见。

提出的方法

实验基于Barhom等人（2019）提出的最先进的共指消解模型作为基线框架进行。
通过组合静态（GloVe、FastText、Word2Vec）、上下文相关（ELMo、BERT、GPT-2）和字符嵌入的不同组合，训练了十六个独立模型。
使用ECB+和EventCorefBank+数据集在EvCR和EnCR任务上的F1分数评估预测性能。
通过测量模型大小、训练时间、推理速度和内存使用量，评估效率权衡。
通过消融研究分离每类嵌入的贡献，分别在单独使用和组合使用时进行分析。
在添加额外嵌入与否的条件下进行实验，以评估边际收益和收益递减现象。

实验结果

研究问题

RQ1在共指消解中，是否存在预测性能与运行时效率之间，与嵌入大小相关的权衡？
RQ2在EvCR和EnCR任务中，同一家族内的不同嵌入（如GloVe vs. FastText vs. Word2Vec）表现如何？
RQ3在预测性能和效率方面，静态、上下文相关和字符嵌入家族之间的嵌入表现有何差异？
RQ4组合使用多种嵌入是否能带来显著的性能提升，还是每次增加都会导致收益递减？
RQ5仅使用字符嵌入是否能达到与更大规模、多嵌入模型相当的性能？

主要发现

仅使用字符嵌入的模型在F1性能上达到了完整模型（使用ELMo、GloVe和字符嵌入）的86%，而模型大小仅为后者的1.2%。
最小的模型（仅字符嵌入）在F1上比仅使用Word2Vec的模型高出约10分，尽管其大小仅为后者的4%。
尽管模型更大且更复杂，完整模型的总体训练速度比仅字符嵌入的模型快21%（14个周期 vs. 24个周期），表明模型大小与训练时间之间相关性较弱。
ELMo在EvCR和EnCR两项任务中均优于BERT和GPT-2，与先前认为BERT在EnCR中优于ELMo的研究结果相矛盾。
在EvCR中，GloVe在静态嵌入中表现最佳；在EnCR中，FastText在静态嵌入中表现领先。
增加多种嵌入带来的预测性能提升呈递减趋势，表明更大的模型并不会成比例地提高准确性。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。