[论文解读] FlexKBQA: A Flexible LLM-Powered Framework for Few-Shot Knowledge Base Question Answering
FlexKBQA 使用LLMs作为程序翻译器来生成合成数据,再辅以执行引导式自训练和固有推理,在多数据集上实现少样本KBQA并取得强劲效果,标注最小化。
Knowledge base question answering (KBQA) is a critical yet challenging task due to the vast number of entities within knowledge bases and the diversity of natural language questions posed by users. Unfortunately, the performance of most KBQA models tends to decline significantly in real-world scenarios where high-quality annotated data is insufficient. To mitigate the burden associated with manual annotation, we introduce FlexKBQA by utilizing Large Language Models (LLMs) as program translators for addressing the challenges inherent in the few-shot KBQA task. Specifically, FlexKBQA leverages automated algorithms to sample diverse programs, such as SPARQL queries, from the knowledge base, which are subsequently converted into natural language questions via LLMs. This synthetic dataset facilitates training a specialized lightweight model for the KB. Additionally, to reduce the barriers of distribution shift between synthetic data and real user questions, FlexKBQA introduces an executionguided self-training method to iterative leverage unlabeled user questions. Furthermore, we explore harnessing the inherent reasoning capability of LLMs to enhance the entire framework. Consequently, FlexKBQA delivers substantial flexibility, encompassing data annotation, deployment, and being domain agnostic. Through extensive experiments on GrailQA, WebQSP, and KQA Pro, we observe that under the few-shot even the more challenging zero-shot scenarios, FlexKBQA achieves impressive results with a few annotations, surpassing all previous baselines and even approaching the performance of supervised models, achieving a remarkable 93% performance relative to the fully-supervised models. We posit that FlexKBQA represents a significant advancement towards exploring better integration of large and lightweight models. The code is open-sourced.
研究动机与目标
- 解决异构知识库模式和查询语言下KBQA的数据标注瓶颈。
- 利用LLMs从KB模板生成多样化且可执行的程序,并将其翻译为自然语言问题。
- 通过执行引导自训练(EGST)弥合合成数据与真实用户问题之间的分布差距。
- 利用LLMs的固有推理能力提升KBQA性能。
- 展示领域无关、可部署的KBQA,在轻量级底层模型下取得强劲的少样本效果。
提出的方法
- 通过模板合集进行逐步扎根,自动抽样生成可执行的KB程序。
- 在低资源场景下,LLMs充当程序翻译器,将程序翻译为自然语言问题。
- 执行引导自训练(EGST),通过教师-学生循环对未标注用户问题进行迭代标注,并筛选噪声伪标签。
- 引入固有推理以利用LLMs的内部知识进行数据增强,在语义解析失败时提供回退。
- 将合成数据、少量标注数据和未标注的用户问题结合,用于训练一个轻量级的KBQA模型。
- 实现基于轻量级底层模型的应用:对于GrailQA/WebQSP使用RnG-KBQA,对于KQA Pro使用BART-SPARQL,并使用gpt-3.5-turbo进行翻译。
实验结果
研究问题
- RQ1一个灵活的KBQA框架是否可以使用LLMs作为程序翻译器来生成高质量的用于少样本KBQA的合成数据?
- RQ2执行引导自训练是否能减轻合成数据与真实用户问题在KBQA任务中的分布偏移?
- RQ3在零-shot与低资源设置下,LLMs的固有推理在多大程度上提升KBQA性能?
- RQ4在不同的KB(GrailQA、WebQSP、KQA Pro)和不同程序类型(S-expression、SPARQL)下,FlexKBQA在标注数据有限的情况下表现如何?
主要发现
- FlexKBQA在GrailQA、WebQSP和KQA Pro上都实现了强劲的少样本性能,优于基线并接近有监督模型。
- 在GrailQA仅有25条标注样例的情形下,FlexKBQA超越了此前的100-shot方法,并接近完全有监督的性能(相对于完全有监督达到93%的相对水平)。
- EGST在GrailQA上将F1提升了10.3,在WebQSP上提升了7.1,在KQA Pro上提升了10.2的准确率, Evidencing有效的分布偏移缓解。
- 固有推理带来额外收益,尤其在KQA Pro上实体链接缺失、LLMs可直接给出准确答案的场景表现突出。
- FlexKBQA展示了零-shot潜力,并在更多标注数据可用时保持优势,表明具有良好的泛化和数据增强收益。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。