QUICK REVIEW

[논문 리뷰] Chart Specification: Structural Representations for Incentivizing VLM Reasoning in Chart-to-Code Generation

Minggui He, Mingchen Dai|arXiv (Cornell University)|2026. 02. 11.

Handwritten Text Recognition Techniques인용 수 0

한 줄 요약

본 논문은 Chart Specification을 제시합니다. 이는 구조에 초점을 맞춘 중간 표현으로, 시각적 구조를 코드와 정렬하여 차트-코드 생성 성능을 향상시키고, 구조적 보상으로 세밀한 학습을 가능하게 하며 데이터 효율적 학습과 강화학습을 가능하게 합니다.

ABSTRACT

Vision-Language Models (VLMs) have shown promise in generating plotting code from chart images, yet achieving structural fidelity remains challenging. Existing approaches largely rely on supervised fine-tuning, encouraging surface-level token imitation rather than faithful modeling of underlying chart structure, which often leads to hallucinated or semantically inconsistent outputs. We propose Chart Specification, a structured intermediate representation that shifts training from text imitation to semantically grounded supervision. Chart Specification filters syntactic noise to construct a structurally balanced training set and supports a Spec-Align Reward that provides fine-grained, verifiable feedback on structural correctness, enabling reinforcement learning to enforce consistent plotting logic. Experiments on three public benchmarks show that our method consistently outperforms prior approaches. With only 3K training samples, we achieve strong data efficiency, surpassing leading baselines by up to 61.7% on complex benchmarks, and scaling to 4K samples establishes new state-of-the-art results across all evaluated metrics. Overall, our results demonstrate that precise structural supervision offers an efficient pathway to high-fidelity chart-to-code generation. Code and dataset are available at: https://github.com/Mighten/chart-specification-paper

연구 동기 및 목표

차트-코드 합성에서 표면 수준 토큰 모방의 한계를 식별한다.
시각적 의도와 코드 실행을 연결하는 최소한의 충분한 구조적 표현으로 Chart Specification을 제안한다.
다양한 차트 토폴로지를 포괄하도록 구조적으로 균형 잡힌 ChartStruct 데이터셋을 구성한다.
강화학습에 대해 세밀하고 검증 가능한 피드백을 제공하는 Spec-Align Reward를 개발한다.
세 가지 공개 벤치마크에서 데이터 효율적이며 최첨단 성능을 입증한다.

제안 방법

Chart Specification을 두 부분 스키마로 정의한다: S = <S_sem, S_code>, 여기서 S_sem은 글로벌 토폴로지, 좌표계, 데이터 도메인, 해석 표현을 인코딩하고, S_code는 런타임 가로채기를 통해 수치 사실을 고정한다.
구조적으로 균형 잡힌 학습 코퍼스 ChartStruct를 20개의 차트 계열에 걸쳐 55개의 구조적 범주(S_struct)로 구성하고, 난이도 인식 샘플링(ρ 계층 90/72/54)을 사용한다.
Group Relative Policy Optimization(GRPO) 프레임워크 내에서 Spec-Align Reward를 도입하고, 계층적 보상 트리로 구성된 단계: Integrity(형식, 실행) 및 의미론/코드 충실도(R_sem, R_code).
R_sem은 Topology Gate, Coord, Domain, Series, Data/Func 구성요소를 결합하고; R_code는 차트 계열별 지표(Statistical, Relational, Vector, Auxiliary)를 런타임 코드 비교에 기반하여 추가합니다.
Qwen2.5-VL-7B 백본으로 4K ChartStruct 인스턴스화를 사용하여 GRPO 하에 3 에포크, 32 배치 크기로 미세조정하고, ChartMimic, Plot2Code, ChartX 벤치마크에서 평가한다.

Figure 1: Motivation for structure-aware chart reasoning. (Top) Direct chart-to-code models rely on surface-level imitation and often hallucinate structural dependencies. (Bottom) By explicitly modeling chart structure via Chart Specification, our approach enforces constraint-consistent plotting log

실험 결과

연구 질문

RQ1구조에 초점을 맞춘 중간 표현이 차트-코드 생성의 구조적 충실도에 어떤 영향을 미치는가?
RQ2구조적으로 균형 잡힌 데이터셋이 차트 추론 과제의 데이터 효율성과 일반화에 기여할 수 있는가?
RQ3Spec-Align이 차트 코드 생성을 위한 강화학습을 향상시키는 의미 있고 검증 가능한 보상을 제공하는가?
RQ4벤치마크 데이터셋에서 Chart Specification의 성능은 최첨단 기준선과 비교하여 어떤가?
RQ5환각 감소와 코드 실행 개선에서 구조 인식 감독의 한계와 이점은 무엇인가?

주요 결과

모델	매개변수	실행율	저수준	고수준	종합
ChartSpec (4K)	7B	93.5%	-	-	82.4%
ChartCoder	-	-	-	-	75.7%
GPT-4o	-	-	-	-	81.2%

ChartSpec은 ChartMimic에서 3k 데이터(전체 점수 79.9) 및 4k 데이터(82.4)로 오픈 소스 MLLMs 중 선두 성능을 달성했다.
4k 데이터로 ChartSpec은 ChartMimic에서 일부 상용 모델(GPT-4o 81.2 등)을 능가하고 최상위 시스템과의 격차를 좁힌다.
해당 방법은 리더보드에서 가장 높은 Execution Rate(93.5%)를 제공하여 코드 타당성과 런타임 실패 감소를 시사한다.
ChartStruct는 구조적으로 복잡한 차트를 우선시하여 데이터 효율성을 크게 향상시키고, 4K에서 최첨단 결과를 달성하면서 대규모 기준선보다 훨씬 적은 샘플을 사용한다.
Spec-Align Reward는 다단계 보상 트리를 통해 촘촘하고 검증 가능한 피드백을 제공하여 순수 지도 학습 기준선보다 구조적 충실도와 논리적 일관성을 향상시킨다.
세 가지 벤치마크(ChartMimic, Plot2Code, ChartX)에서의 실험은 강력한 개선과 데이터 효율적인 차트-코드 생성을 입증한다.

Figure 2: The Overview of Our Framework. (A) Specification-Driven Data Curation: Adopting Chart Specification ( $\mathcal{S}$ ) to extract semantic intent ( $\mathcal{S}_{sem}$ ) and physical execution data ( $\mathcal{S}_{code}$ ) from raw scripts, and guiding the curation of the ChartStruct corpus

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.