QUICK REVIEW

[논문 리뷰] The Pneuma Project: Reifying Information Needs as Relational Schemas to Automate Discovery, Guide Preparation, and Align Data with Intent

Muhammad Imam Luthfi Balaka, Ronield Fernandez|arXiv (Cornell University)|2026. 01. 07.

Data Visualization and Analytics인용 수 0

한 줄 요약

Pneuma-Seeker는 사용자가 진화하는 정보 필요를 관계 스키마로 표현하도록 도와주며, 지휘자 스타일의 계획자와 공유 상태 수렴으로 목적에 맞는 문서를 탐색·생성하는 데이터 발견 및 준비를 안내합니다.

ABSTRACT

Data discovery and preparation remain persistent bottlenecks in the data management lifecycle, especially when user intent is vague, evolving, or difficult to operationalize. The Pneuma Project introduces Pneuma-Seeker, a system that helps users articulate and fulfill information needs through iterative interaction with a language model-powered platform. The system reifies the user's evolving information need as a relational data model and incrementally converges toward a usable document aligned with that intent. To achieve this, the system combines three architectural ideas: context specialization to reduce LLM burden across subtasks, a conductor-style planner to assemble dynamic execution plans, and a convergence mechanism based on shared state. The system integrates recent advances in retrieval-augmented generation (RAG), agentic frameworks, and structured data preparation to support semi-automatic, language-guided workflows. We evaluate the system through LLM-based user simulations and show that it helps surface latent intent, guide discovery, and produce fit-for-purpose documents. It also acts as an emergent documentation layer, capturing institutional knowledge and supporting organizational memory.

연구 동기 및 목표

모호하고 진화하는 사용자 정보 필요를 명시적인 관계 스키마와 SQL 쿼리로 전환하는 프레임워크를 제공한다.
다양한 이질 데이터 소스 전반에 걸친 반자동적이고 언어 기반 지도하의 데이터 발견 및 준비를 가능하게 한다.
작업을 전문적인 맥락으로 분해하고 동적으로 실행 계획을 세움으로써 사용자의 burden를 줄인다.
잠재적 정보 필요를 충족하는 활용 가능한 문서를 향해 사용자의 의도와 시스템 상태를 수렴시킨다.
문서화된 상호 작용과 산출물을 통해 조직 지식과 제도적 기억을 포착한다.

제안 방법

정보 필요를 관계 스키마(T, Q)로 구현하고 유용한 문서로 점진적으로 수렴하는 Pneuma-Seeker 시스템을 도입한다.
콘텍스트 특수화를 사용하여 작업을 특수화된 LLM 맥락(Conductor, IR 시스템, Materializer) 간에 분할한다.
정보 필요에 대한 실시간 진행 상황에 따라 동적 실행 계획을 구성하기 위해 지휘자 스타일의 계획자를 사용한다.
수렴과 반복을 이끌기 위해 사용자와 시스템 간에 공유 상태(T, Q)를 유지한다.
반자동 워크플로를 지원하기 위해 Retrieval-Augmented Generation(RAG), 에이전트형 아키텍처, 구조화된 데이터 준비를 통합한다.
수렴성과 정확성을 평가하기 위해 LLM 기반 사용자 시뮬레이션(LLM Sim)과 고고학/환경 벤치마크를 사용한다.

Figure 1. The Architecture of Pneuma-Seeker

실험 결과

연구 질문

RQ1사용자가 Pneuma-Seeker와 상호 작용함으로써 잠재적 정보 필요에 도달할 수 있는가?
RQ2Pneuma-Seeker가 주어진 정보 필요를 기본 시스템과 비교해 얼마나 정확하게 해결할 수 있는가?

주요 결과

Pneuma-Seeker는 보고된 시뮬레이션에서 기준선보다 일관되게 더 높은 수렴률을 달성한다.
Pneuma-Seeker는 두 개의 벤치마크 데이터세트에서 경쟁력 있는 기준선과 비교하여 정확한 답을 제시한다.
지휘자 기반 계획자를 갖춘 동적이고 맥락 특수화된 아키텍처는 데이터 발견 및 준비를 안내하는 데 있어 정적 파이프라인보다 성능이 우수하다.
시스템은 잠재적 정보 필요를 드러내고 이를 실행 가능한 스키마와 쿼리로 표현하는 데 도움을 준다.
Pneuma-Seeker는 상호 작용 기반 산출물을 통해 조직 지식의 자발적 문서화를 가능하게 한다.

Figure 2. Interface of Pneuma-Seeker , showing: [1] User Query (Clarification), [2] User-Facing Message, and [3] State View Page $(T,Q)$ . Note: the numbers and values of $T$ shown here are not real for privacy reasons.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.