QUICK REVIEW

[논문 리뷰] Matryoshka Representation Learning

Aditya Kusupati, Gantavya Bhatt|arXiv (Cornell University)|2022. 05. 26.

Domain Adaptation and Few-Shot Learning인용 수 24

한 줄 요약

MRL은 단일 임베딩 내에서 중첩된 거친-정밀 표현을 학습하여 추가 추론 비용 없이 적응형 배치를 가능하게 하고, 다양한 작업과 모달리티에서 정확도를 유지하거나 향상시키면서 효율성에서 큰 이점을 얻습니다(예: 표현이 최대 14배 더 작아지고 검색 속도가 최대 14배 빨라짐).

ABSTRACT

Learned representations are a central component in modern ML systems, serving a multitude of downstream tasks. When training such representations, it is often the case that computational and statistical constraints for each downstream task are unknown. In this context rigid, fixed capacity representations can be either over or under-accommodating to the task at hand. This leads us to ask: can we design a flexible representation that can adapt to multiple downstream tasks with varying computational resources? Our main contribution is Matryoshka Representation Learning (MRL) which encodes information at different granularities and allows a single embedding to adapt to the computational constraints of downstream tasks. MRL minimally modifies existing representation learning pipelines and imposes no additional cost during inference and deployment. MRL learns coarse-to-fine representations that are at least as accurate and rich as independently trained low-dimensional representations. The flexibility within the learned Matryoshka Representations offer: (a) up to 14x smaller embedding size for ImageNet-1K classification at the same level of accuracy; (b) up to 14x real-world speed-ups for large-scale retrieval on ImageNet-1K and 4K; and (c) up to 2% accuracy improvements for long-tail few-shot classification, all while being as robust as the original representations. Finally, we show that MRL extends seamlessly to web-scale datasets (ImageNet, JFT) across various modalities -- vision (ViT, ResNet), vision + language (ALIGN) and language (BERT). MRL code and pretrained models are open-sourced at https://github.com/RAIVNLab/MRL.

연구 동기 및 목표

다양한 계산 제약이 있는 다운스트림 작업에 적응하는 유연한 표현을 동기화합니다.
중첩 차원을 통해 여러 세분화를 인코딩하는 단일 임베딩을 제안합니다.
추가 배포 비용 없이 다양한 자원 수준에서 거의 최적의 정확도를 달성합니다.
vision, vision+language, 및 language 모델, 그리고 웹 규모 데이터까지 MRL의 적용 가능성을 시연합니다.

제안 방법

d-차원 표현 z를 정의하고 O(log d) 크기의 중첩 집합 M을 정의합니다.
각 m in M에 대해 처음 m 차원에서 다중 선형 분류기 W^{(m)}를 학습하고 c_m로 가중치를 두어 손실을 합산합니다.
선택적으로 W^{(m)} = W_{1:m} (MRL-E)로 가중치를 묶어 메모리를 줄일 수 있습니다.
최소한의 변화로 감독 학습, 대조 학습, 마스크드 언어 모델링 프레임워크로 MRL을 적용합니다.
최적화된 세분화 간의 보간(interpolate)으로 거친-to-정밀 표현을 시연합니다( M 내의 m 사이에서).
두 가지 배포 모드를 제공합니다: Adaptive Classification (AC) 및 Adaptive Retrieval (AR).

실험 결과

연구 질문

RQ1단일 표현이 계산 예산이 서로 다른 다운스트림 작업을 어떻게 지원할 수 있을까?
RQ2자체적으로 학습된 저차원 기반선과 비교하여 거친-to-정밀 중첩 표현은 정확도를 유지하거나 개선하는가?
RQ3추가 추론 비용 없이 MRL이 웹 규모 데이터셋과 여러 모달리티(비전, 비전+언어, 언어)로 확장될 수 있는가?
RQ4적응형 분류와 적응형 검색 워크플로우에서의 실제 이점은 무엇인가?
RQ5중첩 표현은 로버스트성 및 few-shot, long-tail 학습과 같은 다운스트림 과제에서 어떻게 작동하는가?

주요 결과

Adaptive Classification으로 MRL을 활용하는 경우 ImageNet-1K 분류에서 같은 정확도에서 임베딩 크기를 최대 14배 더 작게 만듭니다.
대규모 검색에서 ImageNet-1K 및 4K에서 최대 14배의 실제 속도 향상을 달성하면서도 유사한 정확도를 유지합니다.
롱테일 few-shot 및 지속적 학습 설정에서 최대 2%의 정확도 향상을 보이며 원래 임베딩과 대등한 강건성을 보입니다.
MRL은 모달리티(ResNet, ViT, ALIGN, BERT) 및 웹 규모 데이터(ImageNet, JFT, ALIGN) 전반에 걸쳐 일반화됩니다.
거친-to-정밀 표현은 차원 간 보간(interpolation)을 보이고, 추가 추론 비용 없이 유연한 배포를 가능하게 합니다.
검색은 적응형 초안 선출 및 재정렬로 효과적으로 수행되며, 중첩 표현을 활용해 상당한 속도 향상을 제공합니다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.