QUICK REVIEW

[論文レビュー] AnalyticsGPT: An LLM Workflow for Scientometric Question Answering

Khang Ly, Georgios Cheirmpos|arXiv (Cornell University)|Feb 10, 2026

Topic Modeling被引用数 0

ひとこと要約

AnalyticsGPT は、取得拡張生成とエージェント的計画を用いた逐次的な LLM 主導の科学メトリクス質問応答ワークフローを提示し、ナイーブなベースラインよりもカバレッジと妥当性を向上させる。

ABSTRACT

This paper introduces AnalyticsGPT, an intuitive and efficient large language model (LLM)-powered workflow for scientometric question answering. This underrepresented downstream task addresses the subcategory of meta-scientific questions concerning the "science of science." When compared to traditional scientific question answering based on papers, the task poses unique challenges in the planning phase. Namely, the need for named-entity recognition of academic entities within questions and multi-faceted data retrieval involving scientometric indices, e.g. impact factors. Beyond their exceptional capacity for treating traditional natural language processing tasks, LLMs have shown great potential in more complex applications, such as task decomposition and planning and reasoning. In this paper, we explore the application of LLMs to scientometric question answering, and describe an end-to-end system implementing a sequential workflow with retrieval-augmented generation and agentic concepts. We also address the secondary task of effectively synthesizing the data into presentable and well-structured high-level analyses. As a database for retrieval-augmented generation, we leverage a proprietary research performance assessment platform. For evaluation, we consult experienced subject matter experts and leverage LLMs-as-judges. In doing so, we provide valuable insights on the efficacy of LLMs towards a niche downstream task. Our (skeleton) code and prompts are available at: https://github.com/lyvykhang/llm-agents-scientometric-qa/tree/acl.

研究の動機と目的

学術機関の実体のNERと科学メトリクス指標からの多面的データ取得を可能にし、科学メトリクス質問への回答チャレンジに対処する。
高レベル計画、詳細計画、行動実行、執筆、可視化のモジュールを備えたエンドツーエンドの LLM ワークフローを開発する。
SME および LLM 審査員を用いて、堅牢性、カバレッジ、整合性、検証性、妥当性を評価するため、 naïve な RAG ベースラインと比較する。

提案手法

LangChain で実装された固定順序のワークフロー HLPM -> DPM -> AM -> WM -> VM を用い、タスク分解とツール呼び出しを管理する。
データソースとして独自の研究分析プラットフォームを用いた取得拡張生成（RAG アプローチ）を採用する。
NP ベースの実体認識と ID 解決を行い、データベースID へ学術実体をマッピングしてからクエリを実行する。
詳細計画モジュールがツール名、サブタスク、依存関係、パラメータ契約を含む低レベル計画を生成する。
アクションモジュールを介してルールベースのクエリ結合を実行し、堅牢で構文的に正しいクエリを保証する。
執筆モジュールでインライン参照を伴う事実に基づく最終的な文章を生成し、任意で可視化モジュールで視覚化を作成する。

Figure 1: Overview of AnalyticsGPT , showing the main modules: High-Level Planning Module (HLPM), Detailed Planning Module (DPM), Action Module (AM), Writing Module (WM), and Visualization Module (VM). Each module, including user input semantics and the RAG interface, is further discussed separately

実験結果

リサーチクエスチョン

RQ1LLM 主導の多モジュールワークフローは、ナイーブな RAG ベースラインと比較して科学メトリクス質問応答の有効性はどうか。
RQ2計画と構造化ツールの使用は、SQA タスクにおけるカバレッジ、整合性、検証性、妥当性にどのような影響を与えるのか。
RQ3システムは複雑で多实体の科学メトリクス質問について、幻覚を抑制しつつデータを信頼性高く検索・統合できるか。
RQ4可視化の強化がユーザーの理解と洞察までの時間に与える影響はどうか。

主な発見

Metric	Naive Baseline	AnalyticsGPT
Resp. Tokens	624 ± 258	681 ± 322
API Time (s)	14.2 ± 6.1	20.9 ± 12.3
Critical Errors	5/84	1/84
Coverage	4.06 ± 1.13	4.40 ± 0.95
Coherence	4.38 ± 1.00	4.59 ± 0.66
Verifiability	4.07 ± 1.01	4.25 ± 0.70
Validity	4.19 ± 1.15	4.56 ± 0.75
Avg.	4.17	4.45

AnalyticsGPT はカバレッジと妥当性の面で、複数の質問形式においてナイーブなベースラインを上回る。
AnalyticsGPT は SME および LLM 評価で一貫性と検証性をベースラインより高く達成する。
構造化された計画（HLPM/DPM）とルールベースのクエリ生成が、重要なデータ取得エラーを減らす。
AM の独立・従属ステップは、結合的な質問の並列実行を可能にし、効率を改善する。
システムはインライン参照と潜在的な可視化を備えた、より豊かで構造化された最終出力を提供し、洞察をサポートする。

Figure 2: Distribution of question forms by count in the evaluation set. Note that single-intent (SING_INT) is a custom definition and not part of DBLP-QuAD. We overrepresent the fact-based category to pad the dataset with ample base cases, as users often tried to ask more complex questions.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。