QUICK REVIEW

[논문 리뷰] MultiVis-Agent: A Multi-Agent Framework with Logic Rules for Reliable and Comprehensive Cross-Modal Data Visualization

Jinwei Lu, Yuanfeng Song|arXiv (Cornell University)|2026. 01. 26.

Data Visualization and Analytics인용 수 0

한 줄 요약

MultiVis-Agent는 네 가지 시나리오에 걸친 신뢰할 수한 교차 모드 시각화 생성을 위한 논리 규칙 보강 다중 에이전트 프레임워크를 도입하고, 벤치마크와 베이스라인 대비 강력한 실증 이점을 제공합니다.

ABSTRACT

Real-world visualization tasks involve complex, multi-modal requirements that extend beyond simple text-to-chart generation, requiring reference images, code examples, and iterative refinement. Current systems exhibit fundamental limitations: single-modality input, one-shot generation, and rigid workflows. While LLM-based approaches show potential for these complex requirements, they introduce reliability challenges including catastrophic failures and infinite loop susceptibility. To address this gap, we propose MultiVis-Agent, a logic rule-enhanced multi-agent framework for reliable multi-modal and multi-scenario visualization generation. Our approach introduces a four-layer logic rule framework that provides mathematical guarantees for system reliability while maintaining flexibility. Unlike traditional rule-based systems, our logic rules are mathematical constraints that guide LLM reasoning rather than replacing it. We formalize the MultiVis task spanning four scenarios from basic generation to iterative refinement, and develop MultiVis-Bench, a benchmark with over 1,000 cases for multi-modal visualization evaluation. Extensive experiments demonstrate that our approach achieves 75.63% visualization score on challenging tasks, significantly outperforming baselines (57.54-62.79%), with task completion rates of 99.58% and code execution success rates of 94.56% (vs. 74.48% and 65.10% without logic rules), successfully addressing both complexity and reliability challenges in automated visualization generation.

연구 동기 및 목표

Text-to-Vis를 다중 모달 입력(텍스트, 이미지, 코드)으로 확장하고 현실 세계의 워크플로를 반영하기 위한 반복적 정제.
LLM 기반 시각화의 신뢰 가능성을 형식적 논리 제약과 중앙 집중식 코디네이터로 보장.
네 가지 시각화 시나리오를 형식화하고 실행 가능한 Python 코드가 포함된 벤치마크(MultiVis-Bench)를 발표.
벡터 기반의 대안 대비 시각화 품질, 작업 완료, 코드 실행 성공에서 실질적인 실증 이득을 보여준다.

제안 방법

LLM 추론을 안내하기 위해 (CR, TE, EH, RC) 네 층의 논리 규칙 프레임워크를 제안하되 이를 대체하지 않는다.
데이터베이스/쿼리, 시각화 구현, 검증 및 평가 에이전트를 조정하는 중앙 집중식 코디네이터 에이전트를 구현한다.
네 가지 MultiVis 시나리오(Basic Generation, Image-Referenced Generation, Code-Referenced Generation, Iterative Refinement)를 형식화하고 127개의 차트 유형과 141개의 데이터베이스에서 1,202개의 사례로 MultiVis-Bench를 구축한다.
형식적 정리를 통해 매개변수 안전성, 오류 복구 및 종료에 대한 수학적 보장을 제공한다.
벤치마크를 통해 시각화 점수, 작업 완료 및 코드 실행 성공에서의 개선을 평가한다.

Figure 1 . Real-world visualization tasks require multi-modal inputs and iterative refinement. Current Text-to-Vis systems fail to support these scenarios.

실험 결과

연구 질문

RQ1다중 에이전트 프레임워크와 논리 규칙이 다중 모달 시각화 생성을 얼마나 더 신뢰성 있고 품질 높게 만들 수 있는가?
RQ2 realistically 필요한 supplementary inputs(이미지, 코드)와 반복 정제 워크플로우는 무엇인가?
RQ3형식적 논리 제약이 LLM 주도 시각화 파이프라인에서 안전하고 종료 가능하며 복구 가능한 실행을 보장할 수 있는가?
RQ4정의된 네 가지 MultiVis 시나리오에서 MultiVis-Agent의 성능은 베이스라인 대비 어떠한가?
RQ54층 논리 규칙 프레임워크가 완료 및 실행 성공률에 미치는 영향은 무엇인가?

주요 결과

도전적인 이미지 참조 생성 작업에서 MultiVis-Agent로 시각화 점수 75.63%를 달성.
동일 작업에서 베이스라인은 62.79%(LLM Workflow)와 57.54%(Instructing LLM)를 달성.
MultiVis-Agent의 작업 완료율은 99.58%에 이른다.
코드 실행 성공률은 94.56%로 베이스라인의 74.48%와 65.10%보다 높다.
논리 규칙은 모든 작업에서 17.58–31.70 포인트의 개선을 기여한다.
논리 규칙이 있는 MultiVis-Agent는 동일 프레임워크의 논리 규칙 없는 버전보다 완료도와 정확도 면에서 우수하다.

Figure 3 . An example for the working process of MultiVis-Agent.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.