QUICK REVIEW

[논문 리뷰] System 1&2 Synergy via Dynamic Model Interpolation

Chenxu Yang, Qingyi Si|arXiv (Cornell University)|2026. 01. 29.

Multimodal Machine Learning Applications인용 수 0

한 줄 요약

DAMI는 Query별로 추론 깊이를 조정하기 위해 System 1(Instruct)과 System 2(Thinking) 체크포인트 사이를 동적으로 보간하고, training-based(DAMI-Pref) 및 training-free(DAMI-Conf) 추정을 사용해 수학 벤치마크에서 더 높은 정확도와 더 낮은 토큰 비용을 달성합니다.

ABSTRACT

Training a unified language model that adapts between intuitive System 1 and deliberative System 2 remains challenging due to interference between their cognitive modes. Recent studies have thus pursued making System 2 models more efficient. However, these approaches focused on output control, limiting what models produce. We argue that this paradigm is misaligned: output length is merely a symptom of the model's cognitive configuration, not the root cause. In this work, we shift the focus to capability control, which modulates extit{how models think} rather than extit{what they produce}. To realize this, we leverage existing Instruct and Thinking checkpoints through dynamic parameter interpolation, without additional training. Our pilot study establishes that linear interpolation yields a convex, monotonic Pareto frontier, underpinned by representation continuity and structural connectivity. Building on this, we propose extbf{DAMI} ( extbf{D}yn extbf{A}mic extbf{M}odel extbf{I}nterpolation), a framework that estimates a query-specific Reasoning Intensity $λ(q)$ to configure cognitive depth. For training-based estimation, we develop a preference learning method encoding accuracy and efficiency criteria. For zero-shot deployment, we introduce a confidence-based method leveraging inter-model cognitive discrepancy. Experiments on five mathematical reasoning benchmarks demonstrate that DAMI achieves higher accuracy than the Thinking model while remaining efficient, effectively combining the efficiency of System 1 with the reasoning depth of System 2.

연구 동기 및 목표

LLM에서 효율적인 추론을 위해 출력 제어에서 능력 제어로의 패러다임 전환을 촉구한다.
선형 매개변수 보간이 정확도와 효율성 사이의 단조롭고 볼록한 Pareto 경계선을 산출함을 보인다.
적응적 인지 깊이를 위한 각 쿼리당 Reasoning Intensity λ(q)를 추정하기 위해 DAMI (DynAmic Model Interpolation)를 도입한다.
데이터 풍부한 환경과 제로샷 배포에 적합한 두 가지 추정 전략(DAMI-Pref 및 DAMI-Conf)을 제공한다.
여러 수학적 추론 벤치마크에서 우수한 정확도-효율성 트레이드오프를 시연한다.

제안 방법

동적 보간을 Instruct와 Thinking 체크포인트 사이에서 형식화한다: Θ(M)(q) = λ(q)Θ(T) + (1−λ(q))Θ(I).
보간이 볼록한 Pareto 경계선과 표현 공간에서의 매끄러운 경로를 따른 표현 연속성을 산출함을 입증한다.
두 가지 λ(q) 추정 접근법을 제안한다: (1) DAMI-Pref로 정확도와 효율성의 균형을 맞추는 선호 학습을 활용; (2) DAMI-Conf로 제로샷 배치를 위한 신뢰 신호 및 모델 간 불일치를 활용.
DAMI-Pref는 보상 모델을 사용해 후보 계수를 쌍으로 된 선호(Acc, Cost)에 따라 순위를 매기고 이진 교차 엔트로피로 학습한다.
DAMI-Conf는 보정된 시그모이드 매핑을 통해 전체적 애매성 및 인지 불일치 신호로 λ(q)를 도출한다.
다섯 가지 수학적 추론 벤치마크에서 출력 제어 및 고정된 능력 기준선과 비교한다.

실험 결과

연구 질문

RQ1System 1과 System 2 체크포인트 간 매개변수 보간이 적응적 추론 깊이를 위한 제어 가능하고 예측 가능한 메커니즘을 제공할 수 있는가?
RQ2DPAMI 접근법(DAMI-Pref 및 DAMI-Conf)이 기존 출력 제어 방법과 비교해 토큰/경제성 제약 하에서 정확도를 향상시키는가?
RQ3Instruct와 Thinking 사이의 보간 경로가 쿼리 전반에 걸쳐 정확도와 효율성 측면에서 연속적이고 단조로운가?
RQ4DAMI 프레임워크가 텍스트 전용 추론을 넘어 멀티모달 작업에 얼마나 잘 일반화되는가?
RQ5쿼리 의존적 추론 강도가 Thinking 비율과 벤치마크 전반의 성능에 미치는 영향은 무엇인가?

주요 결과

Instruct와 Thinking 간의 선형 보간은 연속적인 표현 전환을 갖는 볼록하고 단조로운 Pareto 경계선을 산출한다.
DAMI-Pref는 Qwen3-4B에서 정확도를 최대 3.4 포인트 증가시키고 토큰 사용을 최대 29% 감소시키며; DAMI-Conf는 최대 40%의 토큰 감소와 2.5포인트의 이득을 달성한다.
DAMI-Pref는 다섯 개의 수학 벤치마크에서 정적 병합, 조기 종료, 라우팅 기준선보다 우수한 성능을 보인다.
DAMI-Conf는 모델 계열에 걸쳐 견고한 정확도와 함께 상당한 효율성 향상을 달성하며 멀티모달 작업으로 일반화되기도 한다.
DAMI 방법들은 Thinking 대비 엔드투엔드 속도 향상을 각각 1.46x(DAMI-Pref)와 1.86x(DAMI-Conf) 달성하며 Thinking 시간 및 출력 길이를 줄임으로써 속도를 높인다.
DAMI-Routing 및 기타 기준선은 DAMI가 제공하는 연속적이고 쿼리 적응적인 개선에 미치지 못한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.

[논문 리뷰] System 1&amp;2 Synergy via Dynamic Model Interpolation