QUICK REVIEW

[논문 리뷰] Harbsafe-162. A Domain-Specific Data Set for the Intrinsic Evaluation of Semantic Representations for Terminological Data

Susanne Arndt, Dieter Schnäpp|arXiv (Cornell University)|2020. 05. 29.

linguistics and terminology studies참고 문헌 31인용 수 32

한 줄 요약

Harbsafe-162는 도메인 특화 내재적 평가 데이터 세트로, 전기전자 표준의 용어 항목을 활용하여 조화 작업을 위한 분산 표현 모델을 평가하는 데 사용됩니다.

ABSTRACT

The article presents Harbsafe-162, a domain-specific data set for evaluating distributional semantic models. It originates from a cooperation by Technische Universität Braunschweig and the German Commission for Electrical, Electronic & Information Technologies of DIN and VDE, the Harbsafe project. One objective of the project is to apply distributional semantic models to terminological entries, that is, complex lexical data comprising of at least one or several terms, term phrases and a definition. This application is needed to solve a more complex problem: the harmonization of terminologies of standards and standards bodies (i.e. resolution of doublettes and inconsistencies). Due to a lack of evaluation data sets for terminological entries, the creation of Harbsafe-162 was a necessary step towards harmonization assistance. Harbsafe-162 covers data from nine electrotechnical standards in the domain of functional safety, IT security, and dependability. An intrinsic evaluation method in the form of a similarity rating task has been applied in which two linguists and three domain experts from standardization participated. The data set is used to evaluate a specific implementation of an established sentence embedding model. This implementation proves to be satisfactory for the domain-specific data so that further implementations for harmonization assistance may be brought forward by the project. Considering recent criticism on intrinsic evaluation methods, the article concludes with an evaluation of Harbsafe-162 and joins a more general discussion about the nature of similarity rating tasks. Harbsafe-162 has been made available for the community.

연구 동기 및 목표

도메인 특화 용어 데이터에 대한 분포적 의미 모델의 평가를 촉진하고 가능하게 한다.
표준화 도메인의 용어 항목에 대한 내재적 유사도 평가 데이터 세트(Harbsafe-162)를 생성한다.
용어 데이터에서 의미 표현이 개념 항목의 동일성 및 관련성을 얼마나 잘 포착하는지 평가하여 조화 작업을 지원한다.

제안 방법

기능 안전, IT 보안 및 신뢰성에 관한 전기전자 표준으로부터 표준 기반 항목 코퍼스를 구성한다.
정의와 명칭을 결합한 용어 항목(개념)에 대한 5점 리커트 유사도 평가 과제를 설계한다.
유사도 범주 전반에 균등하게 분포되도록 446개 항목 중에서 152쌍을 선택한다.
해석자 간 합의도를 계산하고 신뢰성을 위해 학계, 산업계, 표준화 분야의 외부 평가자를 참여시킨다.

실험 결과

연구 질문

RQ1용어 데이터에 대한 도메인 특화 유사도 평가 과제가 의미 표현을 신뢰성 있게 평가할 수 있는가?
RQ2기능 안전 및 IT 보안과 같은 도메인에서 주어진 분포 모델이 용어 항목의 의미적 유사성을 얼마나 잘 포착하는가?

주요 결과

Harbsafe-162는 9개 IEC 표준에 걸친 446개의 용어 항목에서 추출한 152쌍 샘플을 사용한다.
Inter-annotator agreement (Krippendorff’s alpha) is 0.78.
The rating task achieves a balanced distribution across categories (no single category dominates).
External raters were recruited after initial in-house ratings to validate the dataset.
The evaluation demonstrates the applicability of the model of Arora, Liang, and Ma (2017) for domain-specific terminological data.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.