Skip to main content
QUICK REVIEW

[论文解读] Harbsafe-162. A Domain-Specific Data Set for the Intrinsic Evaluation of Semantic Representations for Terminological Data

Susanne Arndt, Dieter Schnäpp|arXiv (Cornell University)|May 29, 2020
linguistics and terminology studies参考文献 31被引用 32
一句话总结

Harbsafe-162 是一个领域特定的本征评估数据集,来自电气标准的术语条目,用于评估分布式语义模型在协调任务中的表现。

ABSTRACT

The article presents Harbsafe-162, a domain-specific data set for evaluating distributional semantic models. It originates from a cooperation by Technische Universität Braunschweig and the German Commission for Electrical, Electronic & Information Technologies of DIN and VDE, the Harbsafe project. One objective of the project is to apply distributional semantic models to terminological entries, that is, complex lexical data comprising of at least one or several terms, term phrases and a definition. This application is needed to solve a more complex problem: the harmonization of terminologies of standards and standards bodies (i.e. resolution of doublettes and inconsistencies). Due to a lack of evaluation data sets for terminological entries, the creation of Harbsafe-162 was a necessary step towards harmonization assistance. Harbsafe-162 covers data from nine electrotechnical standards in the domain of functional safety, IT security, and dependability. An intrinsic evaluation method in the form of a similarity rating task has been applied in which two linguists and three domain experts from standardization participated. The data set is used to evaluate a specific implementation of an established sentence embedding model. This implementation proves to be satisfactory for the domain-specific data so that further implementations for harmonization assistance may be brought forward by the project. Considering recent criticism on intrinsic evaluation methods, the article concludes with an evaluation of Harbsafe-162 and joins a more general discussion about the nature of similarity rating tasks. Harbsafe-162 has been made available for the community.

研究动机与目标

  • 激励并使能在领域特定术语数据上评估分布式语义模型。
  • 为领域标准化术语条目创建一个本征相似性评定数据集(Harbsafe-162)。
  • 通过评估语义表示在术语数据中的概念同一性与相关性捕捉程度,来支持协调任务。

提出的方法

  • 从涉及功能安全、IT 安全与可靠性的电气技术标准中构建基于标准的条目语料。
  • 设计一个五点李克特相似性评定任务,用于术语条目(概念),并整合定义与称谓。
  • 从446条条目中选择152对,确保在相似性类别上分布均匀。
  • 计算评注者间的一致性,并邀请来自学术界、工业界和标准化领域的外部评审,以提升可靠性。

实验结果

研究问题

  • RQ1一个面向术语数据的领域特定相似性评定任务是否能够可靠地评估语义表示?
  • RQ2在像功能安全和 IT 安全这样的领域中,给定的分布式模型在捕捉术语条目之间的语义相似性方面有多好?

主要发现

  • Harbsafe-162 使用一个来自九个 IEC 标准的 446 条术语条目中抽取的 152 对样本。
  • 评注者间一致性(Krippendorff α)为 0.78。
  • 评定任务在各类别之间实现了平衡分布(没有单一类别占主导)。
  • 在初步内部评定后招募了外部评估者来验证数据集。
  • 评估显示 Arora, Liang, and Ma (2017) 模型在领域特定术语数据上的适用性。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。