QUICK REVIEW

[논문 리뷰] Architecture-Agnostic Curriculum Learning for Document Understanding: Empirical Evidence from Text-Only and Multimodal

Mohammed Hamdan, Dentamaro, Vincenzo|arXiv (Cornell University)|2026. 02. 02.

Topic Modeling인용 수 0

한 줄 요약

본 연구는 커리큘럼 학습을 위한 점진적 데이터 스케줄링(33%→67%→100%)을 텍스트 전용 BERT와 다중모달 LayoutLMv3에서 FUNSD와 CORD에 대해 평가하여 일관된 계산 감소와 아키텍처 의존적 성능 이점을 보여주며, 매칭 컴퓨트 분석은 용량 제약 모델에 대한 진정한 스케줄링 이점을 밝혀낸다.

ABSTRACT

We investigate whether progressive data scheduling -- a curriculum learning strategy that incrementally increases training data exposure (33\%$ ightarrow$67\%$ ightarrow$100\%) -- yields consistent efficiency gains across architecturally distinct document understanding models. By evaluating BERT (text-only, 110M parameters) and LayoutLMv3 (multimodal, 126M parameters) on the FUNSD and CORD benchmarks, we establish that this schedule reduces wall-clock training time by approximately 33\%, commensurate with the reduction from 6.67 to 10.0 effective epoch-equivalents of data. To isolate curriculum effects from compute reduction, we introduce matched-compute baselines (Standard-7) that control for total gradient updates. On the FUNSD dataset, the curriculum significantly outperforms the matched-compute baseline for BERT ($Δ$F1 = +0.023, $p=0.022$, $d_z=3.83$), constituting evidence for a genuine scheduling benefit in capacity-constrained models. In contrast, no analogous benefit is observed for LayoutLMv3 ($p=0.621$), whose multimodal representations provide sufficient inductive bias. On the CORD dataset, all conditions converge to equivalent F1 scores ($\geq$0.947) irrespective of scheduling, indicating a performance ceiling. Schedule ablations comparing progressive, two-phase, reverse, and random pacing confirm that the efficiency gain derives from reduced data volume rather than ordering. Taken together, these findings demonstrate that progressive scheduling is a reliable compute-reduction strategy across model families, with curriculum-specific benefits contingent on the interaction between model capacity and task complexity.

연구 동기 및 목표

점진적 데이터 스케줄링이 구조적으로 서로 다른 문서 이해 모델(텍스트 전용 대 다중모달) 전반에서 효율성 증가를 가져오는지 평가합니다.
표준 학습 대비 3단계 커리큘럼에서 벽시계 시간 감소를 정량화합니다.
매칭 컴퓨트 기준(Standard-7)을 사용하여 커리큘럼 효과를 컴퓨트 감소와 분리합니다.
순서나 데이터 용량이 개선을 주도하는지 확인하기 위한 스케줄 애블레이션을 평가합니다.
실용적 학습 프로토콜을 안내하기 위한 교차 아키텍처 및 통계 분석을 제공합니다.

제안 방법

3단계 점진적 데이터 스케줄을 10 전체 에폭에 걸쳐 구현하여 6.67 에폭의 노출에 해당하는 효과를 제공합니다.
매칭 컴퓨트 기준(Standard-7)을 사용하여 커리큘럼 효과를 감소된 그래디언트 업데이트로부터 분리합니다.
FUNSD 및 CORD 벤치마크에서 BERT-base(텍스트 전용)와 LayoutLMv3-base(다중모달)를 엔티티 수준 F1으로 seqeval를 통해 비교합니다.
순서의 중요성을 평가하기 위한 스케줄 애블레이션(두 단계, 역순, 무작위)을 수행합니다.
세 개의 시드를 사용한 효과 크기에 대한 Cohen의 d_z를 포함한 짝대 비모수 검정을 보고합니다.
프레임워크 일반성을 테스트하기 위한 합성 데이터로 확장 도메인 평가를 제공합니다.

실험 결과

연구 질문

RQ1점진적 데이터 스케줄링이 텍스트 전용 및 다중모모달 문서 이해 모델 모두의 학습 시간을 감소시키나요?
RQ2컴퓨트 감소를 넘는 커리큘럼의 이점이 두 아키텍처 모두에서 나타나나요, 아니면 아키텍처 의존적인가요?
RQ3데이터 순서(33%→67%→100%)가 단순 데이터 서브샘플링을 넘어 특별히 이롭나요?
RQ4FUNSD 대 CORD와 확장 도메인에서 커리큘럼 성능은 어떻게 달라지나요?

주요 결과

데이터 세트	아키텍처	조건	효율 에폭	최종 손실	엔티티 F1	P / R	시간(초)	속도향상
FUNSD	BERT	Standard-10	10.0	0.508±0.013	0.562±0.009	0.514/0.620	53.7±0.2	–
FUNSD	BERT	Curriculum-10	6.67	0.635±0.031	0.543±0.009	0.496/0.600	35.8±0.1	33.3%
FUNSD	BERT	Standard-7	7.0	0.733±0.006	0.521±0.010	0.469/0.585	37.5±0.0	30.2%
FUNSD	LayoutLMv3	Standard-10	10.0	0.075±0.004	0.821±0.009	0.806/0.836	139.8±1.4	–
FUNSD	LayoutLMv3	Curriculum-10	6.67	0.193±0.009	0.807±0.003	0.781/0.833	92.5±0.7	33.9%
FUNSD	LayoutLMv3	Standard-7	7.0	0.166±0.011	0.803±0.007	0.785/0.823	97.0±0.3	30.6%
CORD	BERT	Standard-10	10.0	0.021±0.002	0.947±0.003	0.951/0.943	277.8±0.3	–
CORD	BERT	Curriculum-10	6.67	0.040±0.001	0.949±0.007	0.952/0.945	185.2±0.1	33.3%
CORD	BERT	Standard-7	7.0	0.041±0.002	0.948±0.003	0.952/0.945	194.5±0.2	30.0%
CORD	LayoutLMv3	Standard-10	10.0	0.025±0.003	0.955±0.003	0.958/0.952	838.9±6.9	–
CORD	LayoutLMv3	Curriculum-10	6.67	0.059±0.003	0.953±0.009	0.958/0.947	557.8±1.2	33.5%
CORD	LayoutLMv3	Standard-7	7.0	0.041±0.003	0.959±0.005	0.963/0.955	584.0±1.7	30.4%

Curriculum-10은 Standard-10에 비해 BERT와 LayoutLMv3 모두에서 약 33%의 벽시계 학습 시간을 감소시킵니다.
FUNSD에서 BERT에 대해 Curriculum-10은 Standard-7보다 우수합니다 (ΔF1 = +0.023, p = 0.022, d_z = 3.83).
FUNSD에서 LayoutLMv3은 Curriculum-10이 Standard-7 대비 F1 이득을 보이지 않습니다 (p = 0.621).
CORD에서 모든 조건은 스케줄링에 관계없이 유사한 F1(≥ 0.947)로 수렴하여 성능 한계를 시사합니다.
아키텍처에 걸쳐 Curriculum-10의 벽시계 속도 향상은 약 33.3%–33.9%(평균 약 33.7%)입니다.
스케줄 애블레이션은 진행형, 2단계, 역방향, 무작위 페이싱 간에 대략 6.67 에폭에서 유의미한 차이를 보이지 않았으며, 효율성은 순서가 아닌 데이터 용량에 의해 좌우됨을 시사합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.