QUICK REVIEW

[论文解读] A Deep Patent Landscaping Model using Transformer and Graph Convolutional Network

Seokkyu Choi, Hyeonju Lee|arXiv (Cornell University)|Mar 14, 2019

Intellectual Property and Patents参考文献 9被引用 1

一句话总结

本文提出一种深度学习模型，结合改进的Transformer用于专利文本分析，以及图卷积网络（GCN）用于专利元数据处理，以实现专利布局的自动化。在12个新整理的基准数据集上评估，该模型实现了98%的最先进平均分类准确率。

ABSTRACT

Patent landscaping is a method that is employed for searching related patents during the process of a research and development (R&D) project. To avoid the risk of patent infringement and to follow the current trends of technology development, patent landscaping is a crucial task that needs to be conducted during the early stages of an R&D project. Generally, the process of patent landscaping requires several advanced resources and can be tedious. Furthermore, the patent landscaping process has to be repeated throughout the duration of an R&D project. Owing to such reasons, the demand for automated patent landscaping is gradually increasing. However, the shortage of well-defined benchmarking datasets and comparable models makes it difficult to find related research studies. In this paper, an automated patent landscaping model based on deep learning is proposed. The proposed model comprises a modified transformer structure for analyzing textual data present in patent documents and a graph convolutional network for analyzing patent metadata. Twelve patent landscaping benchmarking datasets, which were processed by the Korean patent attorney, are proposed for determining the resources required for comparing related research studies. Obtained results indicate that the proposed model with the proposed datasets can attain state-of-the-art performance , and mean classification accuracy of 98% can be achieved.

研究动机与目标

为应对研发项目中对自动化专利布局日益增长的需求，以避免侵权并跟踪技术趋势。
解决专利布局研究中缺乏明确定义的基准数据集和可比模型的问题。
开发一种深度学习框架，有效整合专利文档中的文本特征与元数据特征。
为未来自动化专利分析研究建立可复现且可扩展的基准测试框架。

提出的方法

采用改进的Transformer架构处理并提取专利文档文本内容的语义表征。
应用图卷积网络（GCN）利用分配人、发明人和技术分类等结构化元数据，对专利间的关系进行建模。
将Transformer生成的文本嵌入与GCN生成的图结构表征进行融合，生成统一的专利嵌入表征。
模型端到端训练，基于学习到的表征将专利分类到相关技术类别中。
12个基准数据集由韩国专利律师整理，以实现不同技术领域间评估的标准化。

实验结果

研究问题

RQ1深度学习模型能否有效结合自然语言理解与基于图的关系建模，以提升专利布局的准确性？
RQ2在标准化专利数据集上，所提出的模型与现有方法相比，分类性能如何？
RQ3文本特征与元数据特征的融合在多大程度上能增强专利分析中相关现有技术的检测能力？
RQ4所提出的基准数据集是否适合用于评估和比较未来自动化专利布局系统？

主要发现

所提出的模型在12个基准数据集上实现了98%的平均分类准确率，表现出最先进性能。
基于Transformer的文本建模与基于GCN的元数据分析的融合，显著提升了分类的可靠性，优于单独使用的方法。
整理的基准数据集为未来自动化专利布局研究提供了标准化且可靠的评估框架。
模型的高准确率表明其在多样化技术领域中具有强大的泛化能力，经专家处理的数据集验证。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。