QUICK REVIEW

[논문 리뷰] Text2Cypher: Bridging Natural Language and Graph Databases

Makbule Gülçin Özsoy, Leila Messallem|arXiv (Cornell University)|2024. 12. 13.

Semantic Web and Ontologies인용 수 5

한 줄 요약

이 논문은 공개 소스들을 합쳐 거대한 정제된 Text2Cypher 데이터셋을 구축하고, 다양한 모델을 벤치마크하며, 파인튜닝이 Cypher 쿼리 변환 정확도를 향상시킨다고 보여줍니다.

ABSTRACT

Knowledge graphs use nodes, relationships, and properties to represent arbitrarily complex data. When stored in a graph database, the Cypher query language enables efficient modeling and querying of knowledge graphs. However, using Cypher requires specialized knowledge, which can present a challenge for non-expert users. Our work Text2Cypher aims to bridge this gap by translating natural language queries into Cypher query language and extending the utility of knowledge graphs to non-technical expert users. While large language models (LLMs) can be used for this purpose, they often struggle to capture complex nuances, resulting in incomplete or incorrect outputs. Fine-tuning LLMs on domain-specific datasets has proven to be a more promising approach, but the limited availability of high-quality, publicly available Text2Cypher datasets makes this challenging. In this work, we show how we combined, cleaned and organized several publicly available datasets into a total of 44,387 instances, enabling effective fine-tuning and evaluation. Models fine-tuned on this dataset showed significant performance gains, with improvements in Google-BLEU and Exact Match scores over baseline models, highlighting the importance of high-quality datasets and fine-tuning in improving Text2Cypher performance.

연구 동기 및 목표

비전문가의 그래프 데이터베이스 접근성을 넓히기 위해 자연어를 Cypher로 번역하는 것을 촉진한다.
공공 소스를 결합하여 크고 깨끗하며 사용 가능한 Text2Cypher 데이터셋을 만든다.
Text2Cypher 작업에서 기본 모델과 파인튜닝 모델을 벤치마크한다.
파인튜닝이 베이스라인 대비 성능 향상을 가져오는 것을 입증한다.

제안 방법

16개의 공개 Text2Cypher 데이터셋을 하나의 형식으로 집계하고 정렬하며, 필드는: question, schema, cypher, data_source, database_reference, instance_id인 단일 형식으로 통일한다.
로컬 Neo4j 데이터베이스에서 EXPLAIN을 사용한 구문 검증과 잘못된 쿼리 제거 및 수동 점검을 통해 데이터를 정제한다.
데이터를 학습 세트(≈39,554)와 테스트 세트(≈4,833)로 분할하여 합계 44,387 인스턴스를 구성하고 분포를 분석한다.
번역 기반(Google-BLEU)과 실행 기반(Exact Match) 지표를 사용하여 기본 및 파인튜닝 모델의 범위를 벤치마크한다.
선정된 모델을 새로운 데이터셋에서 파인튜닝하고 베이스라인과 비교하여 이득을 정량화한다.

Figure 1: User wants to write a Cypher query for ‘What are the movies of Tom Hanks‘. A Text2Cypher model translates the input natural language question into Cypher, i.e., ‘MATCH (actor:Person {name: "Tom Hanks"})-[:ACTED_IN]->(movie:Movie) RETURN movie.title AS movies‘

실험 결과

연구 질문

RQ1크고 통합된 Text2Cypher 데이터셋이 자연어를 Cypher로 번역하는 모델의 성능을 향상시킬 수 있는가?
RQ2파인튜닝된 모델이 Text2Cypher 번역 및 실행 지표에서 베이스라인보다 성능이 더 좋은가?
RQ3어떤 모델 패밀리(open-weighted, closed-foundational)가 Text2Cypher에서 파인튜닝으로 가장 큰 이점을 얻는가?
RQ4Text2Cypher 모델에 대해 번역 기반 평가와 실행 기반 평가가 어떻게 비교되는가?

주요 결과

최종 데이터셋은 39,554개의 학습 샘플과 4,833개의 테스트 샘플을 포함한 44,387개의 인스턴스로 구성된다.
파인튜닝된 모델은 Google-BLEU와 Exact Match 지표에서 일관되게 베이스라인 버전을 능가한다.
베이스라인 중 OpenAI/Gpt-4o 및 Gemini-1.5-Pro-001이 특정 설정에서 성능을 주도했으며, 일반적으로 더 큰 모델일수록 더 잘 작동한다.
파인튜닝된 모델들 중에서 베이스라인 대비 최대 약 0.34의 Google-BLEU와 약 0.11의 Exact Match 향상이 있다.
최고의 파인튜닝 결과는 Finetuned-OpenAI/Gpt4o, Finetuned-OpenAI/Gpt4o-mini, 그리고 Finetuned-GoogleAIStudio/Gemini-1.5-Flash-001에 의해 달성되었다.
데이터셋과 튜닝 접근법은 Text2Cypher를 위한 고품질 데이터와 파인튜닝의 중요성을 강조한다.

Figure 2: Relational databases uses SQL-based query languages, while Graph databases commonly uses Cypher query language. The figure shows an example representation of Person, Location, Gender and Marriage entities and relationships on a relational and graph database.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.