QUICK REVIEW

[논문 리뷰] What Makes Good Examples for Visual In-Context Learning?

Yuanhan Zhang, Kaiyang Zhou|arXiv (Cornell University)|2023. 01. 31.

Multimodal Machine Learning Applications인용 수 19

한 줄 요약

본 논문은 맥락 내 시각 예시의 선택이 성능에 결정적 영향을 미친다는 것을 분석하고, 시각적 맥락 학습에서 이익이 되는 프롬프트를 자동으로 선택하기 위한 비감독 및 감독 변형을 포함한 프롬프트 검색 프레임워크를 제안한다.

ABSTRACT

Large-scale models trained on broad data have recently become the mainstream architecture in computer vision due to their strong generalization performance. In this paper, the main focus is on an emergent ability in large vision models, known as in-context learning, which allows inference on unseen tasks by conditioning on in-context examples (a.k.a.~prompt) without updating the model parameters. This concept has been well-known in natural language processing but has only been studied very recently for large vision models. We for the first time provide a comprehensive investigation on the impact of in-context examples in computer vision, and find that the performance is highly sensitive to the choice of in-context examples. To overcome the problem, we propose a prompt retrieval framework to automate the selection of in-context examples. Specifically, we present (1) an unsupervised prompt retrieval method based on nearest example search using an off-the-shelf model, and (2) a supervised prompt retrieval method, which trains a neural network to choose examples that directly maximize in-context learning performance. The results demonstrate that our methods can bring non-trivial improvements to visual in-context learning in comparison to the commonly-used random selection.

연구 동기 및 목표

비시각 인컨텍스트 학습 성능에 영향을 미치는 인-컨텍스트 예시 선택을 조사한다.
비전 모델에서 프롬프트 선택에 따른 다운스트림 태스크의 민감성을 정량화한다.
무작위 선택에 비해 프롬프트 품질을 개선하기 위한 자동 프롬프트 검색 방법을 개발한다.
배포 분포 변화 및 여러 시각 태스크에 걸쳐 로버스트니스를 평가한다.

제안 방법

프롬프트가 모델 매개변수를 업데이트하지 않고 예측을 안내하는 이미지-레이블 쌍의 세트인 모델-무관 시각 인-컨텍스트 학습 설정.
쿼리 x_q에 대해 상위 프롬프트를 선택하기 위한 점수 기반 프롬프트 검색 프레임워크 f_theta(x_n, x_q)을 제안한다.
고정된 시판 특성을 사용한 최근접 이웃 검색을 통한 비감독 프롬프트 검색(UnsupPR)을 구현한다.
대조적 목표를 사용하여 특징 추출기를 학습하여 인-컨텍스트 학습 성능을 극대화하여 감독 프롬프트 검색(SupPR)을 구현한다.
상위 5개 양성/음수 성능 세트에 따라 프롬프트를 끌어당기고 멀리 배치하는 대조적 손실로 SupPR을 학습한다.
사전 학습된 이미지 인페인팅 모델을 사용한 세 가지 다운스트림 태스크(전경 분할, 단일 객체 검출, 이미지 색 보정)로 평가한다.

Figure 1 : (a) Different choices of in-context examples (outlined in green) often lead to significantly different results. Here we show 30 random query images (x-axis) from Pascal- $5^{i}$ (Shaban et al., 2017 ) split 0, and measure the performance range using 50 different in-context examples. (b) W

실험 결과

연구 질문

RQ1다음처럼 인-컨텍스트 예시 선택이 시각적 인-컨텍스트 학습 성능에 태스크 전반에 걸쳐 어떠한 영향을 미치는가?
RQ2자동 프롬프트 검색이 무작위 프롬프트 선택을 넘어 성능을 향상시킬 수 있는가?
RQ3비감독 및 감독 프롬프트 검색 전략이 효과적인가, 그리고 어떤 것이 더 나은 결과를 내는가?
RQ4프롬프트 크기, 백본, 검색 세트 크기가 결과와 분포 변화에 대한 로버스트니스에 어떤 영향을 미치는가?

주요 결과

프롬프트 검색은 전경 분할 및 물체 검출에서 무작위 프롬프트 선택보다 현저히 개선된다(각각 mIoU 6% 이상 및 약 1% 차이).
감독 프롬프트 검색 방법(SupPR)은 비감독 변형(UnsupPR) 및 무작위 기준선보다 일관되게 우수한 성능을 보인다.
다양한 백본(CLIP, EVA, ViT)에서도 프롬프트 검색의 성능 향상은 지속되며 백본 민감도는 최소화된다.
Pascal에서 MSCOCO로의 분포 변화에 대해 SupPR이 UnsupPR 또는 Random보다 더 나은 로버스트성을 보이며, 변화 하에서의 전체 이득은 in-distribution 이득보다 작다.
인-컨텍스트 예시의 수를 늘리면 일반적으로 모든 방법에서 성능이 향상되며, 고품질 예시를 선택하면 순서는 큰 영향을 미치지 않는다.
SupPR은 쿼리와 의미적으로 및 공간적으로 더 가까운 프롬프트를 선택하는 경향이 있어, 의미적 유사성 및 맥락적 유사성의 균형이 이로운 것을 시사한다.

Figure 2 : Overview of the supervised prompt retrieval method. The main idea is to compute the in-context learning result for each source example, and pick those with the highest/lowest results to form a positive/negative set for contrastive learning.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.