Skip to main content
QUICK REVIEW

[论文解读] RADAR: Revealing Asymmetric Development of Abilities in MLLM Pre-training

Yunshuang Nie, Bingqian Lin|arXiv (Cornell University)|Feb 13, 2026
Topic Modeling被引用 0
一句话总结

论文研究多域语言–视觉模型(MLLMs)在预训练过程中能力的发展存在不对称性,并提出 RADAR 用于分析这一现象。

ABSTRACT

Pre-trained Multi-modal Large Language Models (MLLMs) provide a knowledge-rich foundation for post-training by leveraging their inherent perception and reasoning capabilities to solve complex tasks. However, the lack of an efficient evaluation framework impedes the diagnosis of their performance bottlenecks. Current evaluation primarily relies on testing after supervised fine-tuning, which introduces laborious additional training and autoregressive decoding costs. Meanwhile, common pre-training metrics cannot quantify a model's perception and reasoning abilities in a disentangled manner. Furthermore, existing evaluation benchmarks are typically limited in scale or misaligned with pre-training objectives. Thus, we propose RADAR, an efficient ability-centric evaluation framework for Revealing Asymmetric Development of Abilities in MLLM pRe-training. RADAR involves two key components: (1) Soft Discrimination Score, a novel metric for robustly tracking ability development without fine-tuning, based on quantifying nuanced gradations of the model preference for the correct answer over distractors; and (2) Multi-Modal Mixture Benchmark, a new 15K+ sample benchmark for comprehensively evaluating pre-trained MLLMs' perception and reasoning abilities in a 0-shot manner, where we unify authoritative benchmark datasets and carefully collect new datasets, extending the evaluation scope and addressing the critical gaps in current benchmarks. With RADAR, we comprehensively reveal the asymmetric development of perceptual and reasoning capabilities in pretrained MLLMs across diverse factors, including data volume, model size, and pretraining strategy. Our RADAR underscores the need for a decomposed perspective on pre-training ability bottlenecks, informing targeted interventions to advance MLLMs efficiently. Our code is publicly available at https://github.com/Nieysh/RADAR.

研究动机与目标

  • 理解 MLLMs 在预训练过程中如何获得多样化能力。
  • 识别并描述这些能力发展中的不对称性。
  • 提供一个分析与解释 MLLM 预训练动态的框架。

提出的方法

  • 提出 RADAR 框架以剖析 MLLMs 的预训练动态。
  • 分析各项能力在训练阶段的发展模式。
  • 提出度量指标或分析技术,用于比较能力出现的时序。

实验结果

研究问题

  • RQ1MLLMs 在预训练过程中是否表现出不同能力的发展不对称?
  • RQ2哪些指标揭示 MLLM 预训练中能力出现的时序与顺序?
  • RQ3如何解释并比较跨任务/能力的发展动态?

主要发现

  • MLLM 预训练在不同能力之间呈现不对称的发展。
  • RADAR 提供了在何时以及如何某些能力相对于其他能力出现的洞见。
  • 该分析有助于解释进展不均并指导有针对性的预训练策略。

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。