Skip to main content
QUICK REVIEW

[论文解读] AnalyticsGPT: An LLM Workflow for Scientometric Question Answering

Khang Ly, Georgios Cheirmpos|arXiv (Cornell University)|Feb 10, 2026
Topic Modeling被引用 0
一句话总结

AnalyticsGPT 提供一个面向科学计量问题回答的顺序式、由大语言模型驱动的工作流,利用检索增强生成和代理式规划,在覆盖面和有效性方面优于朴素基线。

ABSTRACT

This paper introduces AnalyticsGPT, an intuitive and efficient large language model (LLM)-powered workflow for scientometric question answering. This underrepresented downstream task addresses the subcategory of meta-scientific questions concerning the "science of science." When compared to traditional scientific question answering based on papers, the task poses unique challenges in the planning phase. Namely, the need for named-entity recognition of academic entities within questions and multi-faceted data retrieval involving scientometric indices, e.g. impact factors. Beyond their exceptional capacity for treating traditional natural language processing tasks, LLMs have shown great potential in more complex applications, such as task decomposition and planning and reasoning. In this paper, we explore the application of LLMs to scientometric question answering, and describe an end-to-end system implementing a sequential workflow with retrieval-augmented generation and agentic concepts. We also address the secondary task of effectively synthesizing the data into presentable and well-structured high-level analyses. As a database for retrieval-augmented generation, we leverage a proprietary research performance assessment platform. For evaluation, we consult experienced subject matter experts and leverage LLMs-as-judges. In doing so, we provide valuable insights on the efficacy of LLMs towards a niche downstream task. Our (skeleton) code and prompts are available at: https://github.com/lyvykhang/llm-agents-scientometric-qa/tree/acl.

研究动机与目标

  • 解决通过实现学术实体的NER和从科学计量指标的多方面数据检索来回答科学计量问题的挑战。
  • 开发一个端到端的LLM工作流,具备高级规划、详细规划、执行、写作和可视化的模块。
  • 以朴素的RAG基线为对照,使用 SME 和 LLM评判对系统在鲁棒性、覆盖、连贯、可验证性和有效性方面进行评估。

提出的方法

  • 使用一个固定的顺序工作流(HLPM -> DPM -> AM -> WM -> VM),在 LangChain 中实现以管理任务分解和工具调用。
  • 采用检索增强生成(RAG)方法,以专有研究分析平台作为数据源。
  • 进行基于 NP 的实体识别和 ID 解析,在查询前将学术实体映射到数据库ID。
  • 让 Detailed Planning Module 生成带有工具名称、子任务、依赖关系和参数契约的低级计划。
  • 通过 Action Module 执行动作,使用基于规则的查询拼接以确保查询鲁棒、句法正确。
  • 在 Writing Module 生成带内联引用的事实支撑的最终文本,并可选在 Visualization Module 创建可视化图表以支持洞见。
Figure 1: Overview of AnalyticsGPT , showing the main modules: High-Level Planning Module (HLPM), Detailed Planning Module (DPM), Action Module (AM), Writing Module (WM), and Visualization Module (VM). Each module, including user input semantics and the RAG interface, is further discussed separately
Figure 1: Overview of AnalyticsGPT , showing the main modules: High-Level Planning Module (HLPM), Detailed Planning Module (DPM), Action Module (AM), Writing Module (WM), and Visualization Module (VM). Each module, including user input semantics and the RAG interface, is further discussed separately

实验结果

研究问题

  • RQ1与朴素的RAG基线相比,LLM驱动的多模块工作流在科学计量问答中的有效性如何?
  • RQ2规划与结构化工具使用如何影响SQA任务的覆盖、连贯、可验证性与有效性?
  • RQ3系统在检索并综合复杂的多实体科学计量问题时能否可靠地减少幻觉并实现高质量答案?
  • RQ4可视化增强对用户理解和洞察时间的影响是什么?

主要发现

MetricNaive BaselineAnalyticsGPT
Resp. Tokens624 ± 258681 ± 322
API Time (s)14.2 ± 6.120.9 ± 12.3
Critical Errors5/841/84
Coverage4.06 ± 1.134.40 ± 0.95
Coherence4.38 ± 1.004.59 ± 0.66
Verifiability4.07 ± 1.014.25 ± 0.70
Validity4.19 ± 1.154.56 ± 0.75
Avg.4.174.45
  • AnalyticsGPT 在覆盖率和有效性方面在多种问题形式上优于朴素基线。
  • AnalyticsGPT 在 SME 和 LLM 评估中的一致性和可验证性高于基线。
  • 结构化规划(HLPM/DPM)和基于规则的查询构造减少了关键数据检索错误。
  • AM 中独立与依赖步骤的并行执行使并集型问题的效率得到提升。
  • 系统提供更丰富、结构良好的最终输出,包含内联引用和潜在的可视化以支持洞见。
Figure 2: Distribution of question forms by count in the evaluation set. Note that single-intent (SING_INT) is a custom definition and not part of DBLP-QuAD. We overrepresent the fact-based category to pad the dataset with ample base cases, as users often tried to ask more complex questions.
Figure 2: Distribution of question forms by count in the evaluation set. Note that single-intent (SING_INT) is a custom definition and not part of DBLP-QuAD. We overrepresent the fact-based category to pad the dataset with ample base cases, as users often tried to ask more complex questions.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。