QUICK REVIEW

[논문 리뷰] Self-Discover: Large Language Models Self-Compose Reasoning Structures

Pei Zhou, Jay Pujara|arXiv (Cornell University)|2024. 02. 06.

Topic Modeling인용 수 9

한 줄 요약

Self-Discover는 LLM이 원자적 추론 모듈을 결합하여 작업 내재적 추론 구조를 스스로 발견하게 하며, 더 적은 추론으로도 도전적인 추론 벤치마크를 향상시킨다.

ABSTRACT

We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.

연구 동기 및 목표

작업에 고유한 추론 구조가 존재하도록 프레임워크를 제시한다.
LLM이 먼저 작업 특유의 추론 구조를 발견하고 이를 따라 인스턴스를 해결하는 2단계 프로세스를 개발한다.
자체적으로 발견된 구조가 전통적 프롬프팅 방식보다 더 효율적이고 해석 가능하다는 것을 보인다.
발견된 구조가 모델 패밀리 간에 전이 가능하며 인간의 추론 패턴과의 정렬성을 보여준다.

제안 방법

자연어로 설명된 원자적 추론 모듈의 시드 세트를 정의한다(예: 비판적 사고, 단계별 사고).
Stage 1: 세 가지 동작으로 자체 발견—SELECT 유용한 모듈, ADAPT를 작업에 맞게 조정하고, IMPLEMENT 실행 가능한 JSON-유사 구조를 구축한다.
Stage 2: 디코딩 중에 자체 발견된 구조를 따라 인스턴스를 해결한다.
발견된 구조를 해독을 안내하고 해석 가능하게 만들기 위해 key-value(JSON) 형식으로 표현한다.
Self-Discover를 제로샷 Direct Prompting, Chain-of-Thought (CoT), Plan-and-Solve (PS), 및 CoT-Self-Consistency와 같은 추론 집중형 기준선과 비교한다.

Figure 1 : Self-Discover guides LLMs to self-discover and compose atomic reasoning modules into a reasoning structure to solve challenging tasks. Through testing on challenging reasoning benchmarks incuding Big Bench-Hard (BBH), agent reasoning (T4D), and MATH, we find that Self-Discover outperforms

실험 결과

연구 질문

RQ1자체적으로 발견된 추론 구조가 다양한 벤치마크(BBH, T4D, MATH)에서 LLM 추론을 개선할 수 있는가?
RQ2어떤 작업 범주가 자체 발견 구조의 혜택을 가장 많이 받으며, 대안 프롬 prompting 방법과의 효율성 비교는 어떻게 되는가?
RQ3자체 발견 구조가 모델 패밀리 간 및 다양한 LLM 간에 전이 가능한가?

주요 결과

방법	BBH	T4D	MATH
PaLM 2-L	56%	30%	45%
PaLM 2-L + CoT	60%	40%	42%
PaLM 2-L + PS	61%	42%	49%
PaLM 2-L + Self-Discover	67%	69%	50.5%
GPT-4	58%	51%	70.5%
GPT-4 + CoT	75%	52%	71%
GPT-4 + PS	73%	53%	70%
GPT-4 + Self-Discover	81%	85%	73%

Self-Discover는 PaLM 2-L 및 GPT-4의 추론 성능을 BBH, T4D, MATH에서 향상시키며, 일부 설정에서 CoT 대비 최대 32%의 이득을 보인다.
23개 BBH 과제에서 PaLM 2-L의 경우 CoT 대비 절대적 개선 7%, PS 대비 6%를 달성하였고 GPT-4에서도 유사한 이득을 보인다.
T4D 과제에서 PaLM 2-L은 기준선에 대해 절대 개선 ≥27%, GPT-4는 32%를 달성하며 정확도는 PaLM 2-L 69%, GPT-4 85%이다.
MATH에서 Self-Discover는 PaLM 2-L에 대해 1–7%, GPT-4에 대해 2–3%의 완만한 이득을 보여주며, 실패의 대부분이 구조보다는 연산으로 인한 오차 패턴을 보인다.
Self-Discover는 CoT-Self-Consistency나 다수결 투표와 같은 추론 집중형 대안들보다 10–40배 적은 추론 호출을 달성하면서도 성능을 유지하거나 향상시킨다.
자체 발견 구조는 모델 패밀리 간 전이가 가능하며(PaLM 2-L → GPT-4; GPT-4 → Llama-2-70B) 인간의 추론 패턴과의 공통점을 보인다.

Figure 2 : Illustration of using Self-Discover for problem-solving . Given a generative LM, task, and seed reasoning module descriptions, we guide LMs to generate a reasoning structure in key-value format to solve the task. Finally, models can follow the self-discovered structures to solve the every

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.