Skip to main content
QUICK REVIEW

[论文解读] The Pneuma Project: Reifying Information Needs as Relational Schemas to Automate Discovery, Guide Preparation, and Align Data with Intent

Muhammad Imam Luthfi Balaka, Ronield Fernandez|arXiv (Cornell University)|Jan 7, 2026
Data Visualization and Analytics被引用 0
一句话总结

Pneuma-Seeker 通过把信息需求表述为关系模式来引导数据发现与准备,使用指挥者风格的规划器和共享状态收敛来生成适合用途的文档。

ABSTRACT

Data discovery and preparation remain persistent bottlenecks in the data management lifecycle, especially when user intent is vague, evolving, or difficult to operationalize. The Pneuma Project introduces Pneuma-Seeker, a system that helps users articulate and fulfill information needs through iterative interaction with a language model-powered platform. The system reifies the user's evolving information need as a relational data model and incrementally converges toward a usable document aligned with that intent. To achieve this, the system combines three architectural ideas: context specialization to reduce LLM burden across subtasks, a conductor-style planner to assemble dynamic execution plans, and a convergence mechanism based on shared state. The system integrates recent advances in retrieval-augmented generation (RAG), agentic frameworks, and structured data preparation to support semi-automatic, language-guided workflows. We evaluate the system through LLM-based user simulations and show that it helps surface latent intent, guide discovery, and produce fit-for-purpose documents. It also acts as an emergent documentation layer, capturing institutional knowledge and supporting organizational memory.

研究动机与目标

  • 提供一个框架,将模糊、演变中的用户信息需求转化为明确的关系模式和 SQL 查询。
  • 在异构数据源之间实现半自动、语言引导的数 据发现和准备。
  • 通过将任务分解为专业化情境并动态规划行动,减轻用户负担。
  • 将用户意图与系统状态收敛至可用文档,以满足潜在的信息需求。
  • 通过有据可依的交互与产出捕捉组织知识与制度记忆。

提出的方法

  • 引入 Pneuma-Seeker,将信息需求具体化为关系模式(T, Q),并迭代收敛至可用文档。
  • 使用情境专业化将任务分配给专业化的LLM情境(Conductor、IR System、Materializer)。
  • 采用指挥者风格的规划器,根据信息需求的实时进展组装动态执行计划。
  • 在用户与系统之间维持一个共享状态(T, Q),以引导收敛与迭代。
  • 整合检索增强生成(RAG)、智能代理架构和结构化数据准备,以支持半自动工作流。
  • 使用基于LLM的用户仿真(LLM Sim)和考古/环境基准进行评估,以评估收敛性和准确性。
Figure 1. The Architecture of Pneuma-Seeker
Figure 1. The Architecture of Pneuma-Seeker

实验结果

研究问题

  • RQ1用户在与 Pneuma-Seeker 互动时,是否能够实现潜在的信息需求?
  • RQ2与基线系统相比,Pneuma-Seeker 在多大程度上准确地满足给定的信息需求?

主要发现

  • Pneuma-Seeker 在所报告的仿真中始终比基线具有更高的收敛率。
  • Pneuma-Seeker 在两个基准数据集上的回答在竞争性基线相比下表现出较高的准确性。
  • 采用动态、情境专业化架构并由指挥者式规划器引导的方案在数据发现与准备方面优于静态流程。
  • 该系统能够揭示潜在的信息需求并促成将其表达为可操作的模式和查询。
  • Pneuma-Seeker 能通过其交互驱动的产出实现对组织知识的涌现性文档化。
Figure 2. Interface of Pneuma-Seeker , showing: [1] User Query (Clarification), [2] User-Facing Message, and [3] State View Page $(T,Q)$ . Note: the numbers and values of $T$ shown here are not real for privacy reasons.
Figure 2. Interface of Pneuma-Seeker , showing: [1] User Query (Clarification), [2] User-Facing Message, and [3] State View Page $(T,Q)$ . Note: the numbers and values of $T$ shown here are not real for privacy reasons.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。