Skip to main content
QUICK REVIEW

[论文解读] The EcoLexicon Semantic Sketch Grammar: from Knowledge Patterns to Word Sketches

Pilar León-Araúz, Antonio San Martín|arXiv (Cornell University)|Apr 15, 2018
Linguistics and Cultural Studies被引用 42
一句话总结

本文提出基于 KP 的 EcoLexicon Semantic Sketch Grammar (ESSG),在 Sketch Engine 中实现,使从 EcoLexicon English Corpus 提取富知识上下文和词汇摘要成为可能。它包含公开可用性以及对64条规则的初步评估。

ABSTRACT

Many projects have applied knowledge patterns (KPs) to the retrieval of specialized information. Yet terminologists still rely on manual analysis of concordance lines to extract semantic information, since there are no user-friendly publicly available applications enabling them to find knowledge rich contexts (KRCs). To fill this void, we have created the KP-based EcoLexicon Semantic SketchGrammar (ESSG) in the well-known corpus query system Sketch Engine. For the first time, the ESSG is now publicly available inSketch Engine to query the EcoLexicon English Corpus. Additionally, reusing the ESSG in any English corpus uploaded by the user enables Sketch Engine to extract KRCs codifying generic-specific, part-whole, location, cause and function relations, because most of the KPs are domain-independent. The information is displayed in the form of summary lists (word sketches) containing the pairs of terms linked by a given semantic relation. This paper describes the process of building a KP-based sketch grammar with special focus on the last stage, namely, the evaluation with refinement purposes. We conducted an initial shallow precision and recall evaluation of the 64 English sketch grammar rules created so far for hyponymy, meronymy and causality. Precision was measured based on a random sample of concordances extracted from each word sketch type. Recall was assessed based on a random sample of concordances where known term pairs are found. The results are necessary for the improvement and refinement of the ESSG. The noise of false positives helped to further specify the rules, whereas the silence of false negatives allows us to find useful new patterns.

研究动机与目标

  • 激发需要开发用户友好工具,以从专业语料库中提取语义信息,而无需手动对照分析。
  • 描述基于 KP 的 sketch grammar(ESSG)的构建及其与 Sketch Engine 的整合。
  • 展示 ESSG 如何使提取包含语义关系的知识丰富语境(KRCs)和将术语通过语义关系联系起来的词汇摘要成为可能。
  • 提供一个评估框架,通过精确度和召回率分析来完善 ESSG。

提出的方法

  • 开发基于 KP 的 sketch grammar(ESSG),利用领域无关的知识模式。
  • 将 ESSG 整合到 Sketch Engine 中,以查询 EcoLexicon English Corpus 和任何用户上传的英文语料库。
  • 提取编码 generic-specific、part-whole、location、cause 和 function 关系的知识丰富语境和词汇摘要。
  • 将结果显示为由语义关系连接的术语对组成的词汇摘要。
  • 对这64条英文 sketch grammar 规则在 hyponymy、meronymy 和 causality 上进行初步的精确度与召回率评估。

实验结果

研究问题

  • RQ1KP 基于的 ESSG 是否能够从英文语料库中可靠地提取语义关系(泛化-特定、整体-部分、位置、原因、功能)?
  • RQ2这64条规则在 hyponymy、meronymy 和 causality 的精确度与召回率方面有多有效?
  • RQ3在 refine 规则时,从噪声(误报)和沉默(漏报)中可以获得哪些洞见?
  • RQ4Sketch Engine 用户在 terminalogical research 上能在多大程度上获得有用的知识丰富上下文和词汇摘要?

主要发现

  • ESSG 能够检索 EcoLexicon 语料库和用户上传语料库的知识丰富语境和语义关系词汇摘要。
  • 对64条规则的初步评估显示,精确度和召回率可以为规则改进提供指导。
  • 从每种词汇摘要类型的随机对照样本中评估精确度。
  • 从对照样本中出现已知术语对的随机抽样中评估召回率。
  • 对误报(噪声)的分析有助于细化规则,而对漏报(沉默)的分析揭示了需要添加的新模式。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。