QUICK REVIEW

[論文レビュー] Harbsafe-162. A Domain-Specific Data Set for the Intrinsic Evaluation of Semantic Representations for Terminological Data

Susanne Arndt, Dieter Schnäpp|arXiv (Cornell University)|May 29, 2020

linguistics and terminology studies参考文献 31被引用数 32

ひとこと要約

Harbsafe-162 は、電気技術標準の用語項目からなるドメイン固有の intrinsic 評価データセットで、分布表現モデルを用いた調和化タスクの評価に用いられます。

ABSTRACT

The article presents Harbsafe-162, a domain-specific data set for evaluating distributional semantic models. It originates from a cooperation by Technische Universität Braunschweig and the German Commission for Electrical, Electronic & Information Technologies of DIN and VDE, the Harbsafe project. One objective of the project is to apply distributional semantic models to terminological entries, that is, complex lexical data comprising of at least one or several terms, term phrases and a definition. This application is needed to solve a more complex problem: the harmonization of terminologies of standards and standards bodies (i.e. resolution of doublettes and inconsistencies). Due to a lack of evaluation data sets for terminological entries, the creation of Harbsafe-162 was a necessary step towards harmonization assistance. Harbsafe-162 covers data from nine electrotechnical standards in the domain of functional safety, IT security, and dependability. An intrinsic evaluation method in the form of a similarity rating task has been applied in which two linguists and three domain experts from standardization participated. The data set is used to evaluate a specific implementation of an established sentence embedding model. This implementation proves to be satisfactory for the domain-specific data so that further implementations for harmonization assistance may be brought forward by the project. Considering recent criticism on intrinsic evaluation methods, the article concludes with an evaluation of Harbsafe-162 and joins a more general discussion about the nature of similarity rating tasks. Harbsafe-162 has been made available for the community.

研究の動機と目的

Domain-specific terminological data に対する distributional semantic models の評価を動機付け、可能にする。
標準化分野の terminological entries の intrinsic similarity rating データセット（Harbsafe-162）を作成する。
terminological data における concept identity と relatedness を semantic representations がどれだけ捉えているかを評価することで、調和化タスクを支援する。

提案手法

functional safety、IT security、and dependability に関する電気・電子標準に基づくエントリ・コーパスを構築する。
definitions and designations を統合した terminological entries (concepts) のための five-point Likert 相関評価タスクを設計する。
152 ペアを 446 エントリから選択し、相関カテゴリ間で均等な分布を確保する。
インターアノテーション協定を計算し、信頼性のために学術界、産業界、および標準化からの外部評価者を関与させる。

実験結果

リサーチクエスチョン

RQ1ドメイン固有の terminological data の相関評価タスクは、 semantic representations を信頼性高く評価できるか。
RQ2functional safety や IT security のようなドメイン全体で、ある distributional model が terminological entries の semantic similarity をどれだけうまく捉えるか。

主な発見

Harbsafe-162 は、9 つの IEC 標準にわたる 446 の terminological entries から抽出された 152 ペアのサンプルを使用する。
Inter-annotator agreement (Krippendorff’s alpha) は 0.78。
評価タスクはカテゴライズ間で均等な分布を達成しており、いずれかのカテゴリが過度に優位ではない。
外部評価者は、初期の社内評価の後に dataset の妥当性を検証するために募集された。
評価は domain-specific terminological data に対する Arora、Liang、Ma (2017) のモデルの適用性を実証している。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。