QUICK REVIEW

[論文レビュー] Plex: Towards Reliability using Pretrained Large Model Extensions

Dustin Tran, Jeremiah Liu|arXiv (Cornell University)|Jul 15, 2022

Multimodal Machine Learning Applications被引用数 38

ひとこと要約

Plex は ViT-Plex および T5-Plex を導入します。これは、視覚と言語のタスクで未知性、堅牢な一般化、適応に関するタスクの信頼性を、タスク特異的な調整なしに改善する事前学習済み大規模モデル拡張です。論文は、モデルサイズと事前学習データのスケーリングを、アンサンブリングとラストレイヤー技法と組み合わせることで、40のデータセットにわたる最先端の信頼性を達成することを示しています。

ABSTRACT

A recent trend in artificial intelligence is the use of pretrained models for language and vision tasks, which have achieved extraordinary performance but also puzzling failures. Probing these models' abilities in diverse ways is therefore critical to the field. In this paper, we explore the reliability of models, where we define a reliable model as one that not only achieves strong predictive performance but also performs well consistently over many decision-making tasks involving uncertainty (e.g., selective prediction, open set recognition), robust generalization (e.g., accuracy and proper scoring rules such as log-likelihood on in- and out-of-distribution datasets), and adaptation (e.g., active learning, few-shot uncertainty). We devise 10 types of tasks over 40 datasets in order to evaluate different aspects of reliability on both vision and language domains. To improve reliability, we developed ViT-Plex and T5-Plex, pretrained large model extensions for vision and language modalities, respectively. Plex greatly improves the state-of-the-art across reliability tasks, and simplifies the traditional protocol as it improves the out-of-the-box performance and does not require designing scores or tuning the model for each task. We demonstrate scaling effects over model sizes up to 1B parameters and pretraining dataset sizes up to 4B examples. We also demonstrate Plex's capabilities on challenging tasks including zero-shot open set recognition, active learning, and uncertainty in conversational language understanding.

研究の動機と目的

信頼性を、タスク特異的な調整なしに未知性、堅牢な一般化、適応タスク全般で一貫した性能として定義する。
視覚と言語の40データセットにわたる10のタスクタイプで大規模事前学習済みモデルを評価する。
ViT-Plex および T5-Plex を開発し、信頼性へのスケーリング効果、アンサンブリング、およびラストレイヤー技法を評価する。

提案手法

ViT-Plex（視覚）と T5-Plex（言語）を、3つのサイズスケール（small、base、large）を持つ ViT および T5 アーキテクチャに基づいて構築する。
視覚は最大4B画像、言語はC4テキストといった大規模で多様なデータセットで事前学習し、層間での効率的なアンサンブリング（BatchEnsemble）を適用する。
不確実性とラベルノイズを捉えるため、最後の層の変更（Gaussian process last layer、heteroscedastic last layer）を組み込む。
未知性を含む新しいセット ImageNet ReaL-H および NaLUE を含む、40データセットを対象とした10タスクタイプで信頼性を評価する。
事前学習と微調整の寄与を実験し、信頼性指標におけるスケーリング傾向を分析する。

実験結果

リサーチクエスチョン

RQ1タスク特異的な調整なしに、未知性・一般化・適応ベンチマークを跨いで大規模事前学習モデルはどの程度信頼性高く動作するか？
RQ2モデルサイズ、事前学習データサイズ、および信頼性向上技術（アンサンブリング、GP 最後の層、Het 最後の層）が視覚と言語の信頼性指標に与える影響は？
RQ3事前学習信号はタスク間の下流の信頼性パフォーマンスを予測できるか？
RQ4Plex にとって事前学習フェーズと微調整フェーズは信頼性の改善にどう寄与するか？
RQ5ViT-Plex および T5-Plex の信頼性指標におけるスケーリング傾向はどうなるか？

主な発見

モデルサイズの拡大は、視覚と言語のタスクにおける信頼性を向上させる。
より大きな事前学習データセット（最大4Bの例）ほど、小さなデータセットより信頼性が高い。
効率的なアンサンブリング（BatchEnsemble）と最後の層の手法（GP または Het）は、信頼性アブレーションで一貫して最高評価を得た。
事前学習のパフォーマンス（例：JFT）は、下流の信頼性スコアと強く相関し、データセットサイズだけよりも影響が大きい。
Plex は多くのタスクで最先端の信頼性を達成し、タスク特異的な調整なしに組み立て時の信頼性を提供する。
T5-Plex L は T5-Plex B を上回ることが多く、スケールの利点を示す； BE+GP および BE 構成は MNLI および NaLUE で notably 高く機能する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。