QUICK REVIEW

[논문 리뷰] OddGridBench: Exposing the Lack of Fine-Grained Visual Discrepancy Sensitivity in Multimodal Large Language Models

Tengjin Weng, Wenhao Jiang|arXiv (Cornell University)|2026. 03. 10.

Multimodal Machine Learning Applications인용 수 0

한 줄 요약

논문은 미세한 시각 차이 감수성을 평가하기 위한 제어 가능한 1,400-이미지 그리드 벤치마크 OddGridBench와 지각 구분 개선을 위한 커리큘럼 및 거리 인식 보상 기반 강화 학습 프레임워크 OddGrid-GRPO를 소개한다.

ABSTRACT

Multimodal large language models (MLLMs) have achieved remarkable performance across a wide range of vision language tasks. However, their ability in low-level visual perception, particularly in detecting fine-grained visual discrepancies, remains underexplored and lacks systematic analysis. In this work, we introduce OddGridBench, a controllable benchmark for evaluating the visual discrepancy sensitivity of MLLMs. OddGridBench comprises over 1,400 grid-based images, where a single element differs from all others by one or multiple visual attributes such as color, size, rotation, or position. Experiments reveal that all evaluated MLLMs, including open-source families such as Qwen3-VL and InternVL3.5, and proprietary systems like Gemini-2.5-Pro and GPT-5, perform far below human levels in visual discrepancy detection. We further propose OddGrid-GRPO, a reinforcement learning framework that integrates curriculum learning and distance-aware reward. By progressively controlling the difficulty of training samples and incorporating spatial proximity constraints into the reward design, OddGrid-GRPO significantly enhances the model's fine-grained visual discrimination ability. We hope OddGridBench and OddGrid-GRPO will lay the groundwork for advancing perceptual grounding and visual discrepancy sensitivity in multimodal intelligence. Code and dataset are available at https://wwwtttjjj.github.io/OddGridBench/.

연구 동기 및 목표

MLLM에서 고수준 작업 너머의 저수준 시각 지각을 평가할 필요성을 제시한다.
색상, 크기, 회전, 위치에 걸친 지각 차이 민감도를 정량화하기 위해 OddGridBench를 제안한다.
많은 MLLMs가 미세한 시각적 변형에서 인간보다 뒤처진다는 점을 보여준다.
커리큘럼 학습과 거리 인식 보상을 통해 지각 기초를 강화하기 위한 OddGrid-GRPO를 개발한다.

제안 방법

OddGridBench는 단일 속성 및 다중 속성 차이를 가진 그리드 기반 이미지를 구성하여 1,400개의 테스트 샘플과 학습/검증 분할을 제공한다.
아이콘은 제어 가능한 지각 조작을 가능하게 하는 IconFont 및 Material Design Icons의 SVG로 사용된다.
OddGrid-GRPO는 커리큘럼 학습과 거리 인식 보상을 결합하여 미세한 공간 위치 식별에 대한 RL 최적화를 지도한다.
거리 보상은 공간적 거리와 비슷한 가우시안 형태의 함수로 감쇠시키고 적응 시그마와 작은 편향을 추가하며, 전체 보상은 포맷 패널티와 이 보상을 혼합한다.
커리큘럼 유도 최적화는 샘플을 Easy/Medium/Hard로 구분하고 학습 안정화를 위해 세 단계의 점진적 학습을 수행한다.
평가는 오픈 소스 및 독점 계열에서 온 19개의 MLLMs를 사용하고 기준선 및 버전 제거 비교를 수행한다.

Figure 1 : Illustration of human perceptual visual discrepancy sensitivity, showing the transition from imperceptible to perceptible visual differences in color, rotation, and size.

실험 결과

연구 질문

RQ1현재의 MLLMs가 거친 차이 너머의 그리드 기반 сцен에서의 미세한 시각 차이를 감지할 수 있는가?
RQ2색상, 크기, 회전, 위치 변동 및 다속성 조합에서 모델의 성능은 어떻게 달라지는가?
RQ3거리 인식 보상을 포함한 커리큘럼 가이드 RL이 지각 민감도와 위치 정확도를 향상시킬 수 있는가?
RQ4OddGridBench 과제에서 인간과 MLLMs 간의 인지 차이는 어느 정도인가?
RQ5RL 기반 개선이 모델 계열 및 인지 과제 전반으로 일반화되는가?

주요 결과

방법	색상	크기	회전	위치	2-타입	3-타입	4-타입	합계
베이스라인	23.00	5.00	12.50	7.00	19.00	22.50	31.00	17.14
GRPO	88.50	44.00	67.50	41.50	78.50	83.00	93.00	70.86
GSPO	70.00	55.00	81.50	59.00	85.50	85.50	95.00	75.93
OddGrid-GRPO (rd 없이)	87.50	44.50	67.00	45.50	80.50	91.00	91.50	72.50
OddGrid-GRPO (Cur-Guided 없이)	87.50	60.50	69.00	64.00	84.00	88.50	95.50	78.43
OddGrid-GRPO	89.50	64.50	80.50	64.50	90.50	91.50	97.50	82.64

인간의 성능이 색상, 크기, 회전, 위치 및 다속성 과제에서 평가된 모든 MLLMs를 크게 앞선다.
Qwen3-VL-32B가 전체 정확도에서 가장 높게 68.07%를 달성하지만 모든 모델은 인간(87.47%)보다 뒤처진다.
OddGrid-GRPO는 GRPO 및 베이스라인보다 향상된 성능을 보이며 총 정확도 82.64%를 달성하고 회전 및 위치에서 두드러진 개선을 보인다.
거리 보상이나 커리큘럼 기반 구성 요소를 제거하면 성능이 감소하여 거리 인식 피드백과 점진적 학습의 이점을 강조한다.
지각 차이의 크기가 커질수록 정확도는 향상되지만 색상에서 가장 큰 이득을 보이고 회전/위치는 상대적으로 뒤처져 있어 현재 모델의 미세한 감지 능력이 한계에 있음.

Figure 2 : Overview of OddGridBench. OddGridBench encompasses four primary visual attributes, including color, size, rotation, and position, and supports both single-attribute and multi-attribute discrepancy compositions, providing a systematic framework for evaluating the perceptual discrepancy sen

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.