[논문 리뷰] Intermediate-Task Transfer Learning with Pretrained Models for Natural Language Understanding: When and Why Does It Work?
본 논문은 110개의 중간–목표 태스크 쌍과 25개의 프로빙 태스크를 사용한 RoBERTa에 대한 대규모 연구를 수행하여 중간태스크 학습이 자연어 이해에 언제, 왜 도움이 되는지 이해하고자 하며, 고수준 추론 태스크가 일반적으로 유익하고 선행 학습의 망각이 전이(transfer)를 제한할 수 있음을 발견했다.
While pretrained models such as BERT have shown large gains across natural language understanding tasks, their performance can be improved by further training the model on a data-rich intermediate task, before fine-tuning it on a target task. However, it is still poorly understood when and why intermediate-task training is beneficial for a given target task. To investigate this, we perform a large-scale study on the pretrained RoBERTa model with 110 intermediate-target task combinations. We further evaluate all trained models with 25 probing tasks meant to reveal the specific skills that drive transfer. We observe that intermediate tasks requiring high-level inference and reasoning abilities tend to work best. We also observe that target task performance is strongly correlated with higher-level abilities such as coreference resolution. However, we fail to observe more granular correlations between probing and target task performance, highlighting the need for further work on broad-coverage probing benchmarks. We also observe evidence that the forgetting of knowledge learned during pretraining may limit our analysis, highlighting the need for further work on transfer learning methods in these settings.
연구 동기 및 목표
- Investigate which intermediate tasks most benefit a wide range of target NLU tasks.
- Identify the linguistic skills learned during intermediate-task training that transfer to targets.
- Examine how probing task performance correlates with target-task improvements to explain transfer.
- Assess whether dataset size of intermediate tasks explains transfer differences.
- Explore potential limitations such as catastrophic forgetting during transfer learning.
제안 방법
- Fine-tune RoBERTa on each of 11 intermediate tasks individually.
- Fine-tune the intermediate-task trained models on 10 target tasks and 25 probing tasks separately.
- Evaluate transfer by comparing target-task performance to baselines without intermediate training.
- Use 3 random restarts to obtain 1260 observations across tasks and baselines.
- Apply a hyperparameter sweep for learning rate and dropout, then fix best parameters per task.
- Utilize RoBERTa-Large and standard fine-tuning procedures consistent with prior work.
실험 결과
연구 질문
- RQ1Which intermediate tasks broadly improve performance across diverse target tasks?
- RQ2What linguistic skills do intermediate tasks teach that aid target tasks, as revealed by probing tasks?
- RQ3How do probing-task performances relate to target-task improvements, and can they explain transfer benefits?
- RQ4Does intermediate-task dataset size or forgetting of pretraining constrain transfer effects?
주요 결과
- Tasks requiring high-level inference and commonsense reasoning tend to be good intermediate tasks.
- MNLI and CosmosQA-like tasks show positive transfer across many targets; SocialIQA often yields negative transfer.
- Low-level input-preservation skills show little correlation with target-task performance, while higher-level abilities tied to MLM-like tasks correlate more.
- Probing correlations indicate that semantic and coreference-related probes correlate with target performance, whereas many SentEval probes do not.
- Catastrophic forgetting of pretraining may limit transfer; integrating MLM objectives during intermediate training could help mitigate forgetting.
- Degenerate runs are less likely with intermediate-task training, but highly negative transfer can increase degeneracy in some cases.
더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.