QUICK REVIEW

[논문 리뷰] What Makes Good In-Context Examples for GPT-$3$?

Jiachang Liu, Dinghan Shen|arXiv (Cornell University)|2021. 01. 17.

Topic Modeling인용 수 155

한 줄 요약

이 논문은 시맨틱하게 유사한 in-context 예시를 검색하는 것이(KATE를 통해) GPT-3의 few-shot 성능을 크게 향상시킴을 보여주며, 이는 감정 분석, 표-텍스트 생성, 오픈 도메인 QA 전반에 걸쳐 나타나고, 작업 관련 문장 인코더로 이득이 증폭됩니다.

ABSTRACT

GPT-$3$ has attracted lots of attention due to its superior performance across a wide range of NLP tasks, especially with its powerful and versatile in-context few-shot learning ability. Despite its success, we found that the empirical results of GPT-$3$ depend heavily on the choice of in-context examples. In this work, we investigate whether there are more effective strategies for judiciously selecting in-context examples (relative to random sampling) that better leverage GPT-$3$'s few-shot capabilities. Inspired by the recent success of leveraging a retrieval module to augment large-scale neural network models, we propose to retrieve examples that are semantically-similar to a test sample to formulate its corresponding prompt. Intuitively, the in-context examples selected with such a strategy may serve as more informative inputs to unleash GPT-$3$'s extensive knowledge. We evaluate the proposed approach on several natural language understanding and generation benchmarks, where the retrieval-based prompt selection approach consistently outperforms the random baseline. Moreover, it is observed that the sentence encoders fine-tuned on task-related datasets yield even more helpful retrieval results. Notably, significant gains are observed on tasks such as table-to-text generation (41.9% on the ToTTo dataset) and open-domain question answering (45.5% on the NQ dataset). We hope our investigation could help understand the behaviors of GPT-$3$ and large-scale pre-trained LMs in general and enhance their few-shot capabilities.

연구 동기 및 목표

GPT-3가 in-context 예시 선택에 얼마나 민감한지 동기부여하고 이해한다.
랜덤 샘플링보다 검색 기반 선택이 더 나은 성능을 내는지 조사한다.
작업 관련 문장 인코더가 검색된 예시의 품질 및 이후 GPT-3 예측에 미치는 영향을 평가한다.
다수의 NLP 태스크에서 비모수적 검색 보강(KATE)의 효과를 입증한다.

제안 방법

맥락 C를 k개의 in-context 예시와 그 표기를 포함하는 조건부 텍스트 생성으로 in-context 학습으로 형식화한다.
무작위 in-context 샘플링과 문장 임베딩 공간에서의 최근접 이웃을 이용한 검색 기반 선택을 실험적으로 비교한다.
테스트 샘플의 training 세트에서 k개의 최근접 이웃을 검색하여 GPT-3 프롬프트의 in-context 예시로 사용하는 KATE(K nn- Augmented in-Context d E xample selection)를 제안한다.
검색 임베딩으로 RoBERTa 기반 모델을 포함하여 SNLI/MNLI 및 STS-B에서 파인튜닝된 여러 문장 인코더를 평가한다.
性能에 영향을 미치는 in-context 예시의 수, 검색을 위한 training 세트 크기 및 in-context 예시의 순서를 분석한다.

실험 결과

연구 질문

RQ1in-context 예시의 검색 기반 선택이 무작위 샘플링에 비해 GPT-3 소샷 성능을 향상시키는가?
RQ2문장 임베딩으로 측정한 검색된 예시의 의미적 품질이 GPT-3 결과에 어떤 영향을 미치는가?
RQ3작업 관련 문장 인코더와 더 큰 검색 세트가 감정 분석, 표-텍스트 생성, QA와 같은 태스크에서 더 큰 이득을 주는가?
RQ4in-context 예시의 수와 순서가 KATE의 효과성에 어떤 영향을 미치는가?

주요 결과

검색 기반 in-context 예시 선택은 다수의 태스크에서 일관되게 무작위 샘플링보다 우수하다.
작업 관련 데이터(NLI, STS-B, SST-2)에서 파인튜닝된 문장 인코더가 더 강력한 검색 결과와 더 높은 GPT-3 성능을 낸다.
KATE는 ToTTo 표-텍스트 생성 및 개방형 도메인 QA 데이터셋에서 상당한 이득을 달성하며 기준선 대비 주목할 만한 개선을 보인다.
검색된 예시 수를 늘리는 것이 일반적으로 성능을 향상시키며, 작업에 맞춰 정렬된 인코더를 사용하는 것이 이득을 더욱 높인다.
검색된 예시가 GPT-3에 더 자세하고 관련 있는 컨텍스트를 제공하여 허위 진술을 줄이고 정답 충실도를 향상시킨다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.