QUICK REVIEW

[論文レビュー] Matryoshka Representation Learning

Aditya Kusupati, Gantavya Bhatt|arXiv (Cornell University)|May 26, 2022

Domain Adaptation and Few-Shot Learning被引用数 24

ひとこと要約

MRLは単一の埋め込み内に粗から細へ段階的なネスト表現を学習し、追加の推論コストなしの適応デプロイを実現し、タスクとモダリティ全体で精度を維持または向上させつつ、効率性に大きな利点を得る（例：表現を最大14倍小さく、検索速度を最大14倍向上）

ABSTRACT

Learned representations are a central component in modern ML systems, serving a multitude of downstream tasks. When training such representations, it is often the case that computational and statistical constraints for each downstream task are unknown. In this context rigid, fixed capacity representations can be either over or under-accommodating to the task at hand. This leads us to ask: can we design a flexible representation that can adapt to multiple downstream tasks with varying computational resources? Our main contribution is Matryoshka Representation Learning (MRL) which encodes information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks. MRL minimally modifies existing representation learning pipelines and imposes no additional cost during inference and deployment. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations. The flexibility within the learned Matryoshka Representations offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations. Finally, we show that MRL extends seamlessly to web-scale datasets (ImageNet, JFT) across various modalities -- vision (ViT, ResNet), vision + language (ALIGN) and language (BERT). MRL code and pretrained models are open-sourced at https://github.com/RAIVNLab/MRL.

研究の動機と目的

変動する計算制約を持つ下流タスクに適応する柔軟な表現を動機づける。
ネストされた次元を通じて複数の粒度をエンコードする単一の埋め込みを提案する。
追加のデプロイコストなしで複数の資源レベルでほぼ最適な精度を実現する。
MRLの視覚・視覚＋言語・言語モデル、およびウェブ規模データにおける適用性を実証する。

提案手法

d次元の表現zとO(log d)サイズのネスティング集合Mを定義する。
M内の各mについて最初のm次元の上で線形分類器W^{(m)}を複数訓練し、c_mで重み付けして損失を統合する。
必要に応じて重みを結びつけてメモリを削減する：W^{(m)} = W_{1:m}（MRL-E）。
最小限の変更でMRLを監督付き、対比学習、およびマスクド言語モデリングのフレームワークに適用する。
最適化された粒度間（M内のm間）を補間する粗から細への表現を実証する。
2つのデプロイモードを提供する：Adaptive Classification（AC）とAdaptive Retrieval（AR）。

実験結果

リサーチクエスチョン

RQ1単一の表現が、計算予算が異なる下流タスクをどのようにサポートできるか？
RQ2粗から細へネストされた表現は、独立に訓練された低次元ベースラインと比較して、精度を保持または向上させるか？
RQ3MRLはウェブ規模データセットや複数モダリティ（視覚、視覚＋言語、言語）に追加の推論コストなしでスケールできるか？
RQ4適応型分類と適応検索のワークフローにおける実務的な利点は何か？
RQ5ネスト表現はロバスト性やfew-shot・長尾学習といった下流タスクでどのように振る舞うか？

主な発見

MRLを用いたAdaptive ClassificationでImageNet-1K分類において、同じ精度で埋め込みサイズを最大14倍小さくできる。
ImageNet-1Kおよび4Kで大規模検索を最大14倍の実世界スピードアップを実現しつつ、精度を同等に維持。
長尾のfew-shotおよび継続学習設定で最大2%の精度向上を達成し、ロバストネスは元の埋め込みと同等。
MRLはモダリティ間（ResNet、ViT、ALIGN、BERT）およびウェブ規模データ（ImageNet、JFT、ALIGN）に跨って一般化する。
粗から細への表現は次元間の補間を示し、追加の推論コストなしで柔軟なデプロイを可能にする。
検索はネスト表現を活用した適応的ショートリスト化とリランキングで効果的に実行でき、著しい速度向上を実現。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。