QUICK REVIEW

[논문 리뷰] Multi-Step Reasoning with Large Language Models, a Survey

Aske Plaat, Annie Wong|arXiv (Cornell University)|2024. 07. 16.

Natural Language Processing Techniques인용 수 12

한 줄 요약

이 연구고찰은 대형 언어 모델에서 프롬프트 기반의 다단계 추론을 검토하고, 3단계 분류법(생성, 평가, 제어)을 제시하며 벤치마크와 향후 연구 방향을 요약한다.

ABSTRACT

Large language models (LLMs) with billions of parameters exhibit in-context learning abilities, enabling few-shot learning on tasks that the model was not specifically trained for. Traditional models achieve breakthrough performance on language tasks, but do not perform well on basic reasoning benchmarks. However, a new in-context learning approach, Chain-of-thought, has demonstrated strong multi-step reasoning abilities on these benchmarks. The research on LLM reasoning abilities started with the question whether LLMs can solve grade school math word problems, and has expanded to other tasks in the past few years. This article reviews the field of multi-step reasoning with LLMs. We propose a taxonomy that identifies different ways to generate, evaluate, and control multi-step reasoning. We provide an in-depth coverage of core approaches and open problems, and we propose a research agenda for the near future. We find that multi-step reasoning approaches have progressed beyond math word problems, and can now successfully solve challenges in logic, combinatorial games, and robotics, sometimes by first generating code that is then executed by external tools. Many studies in multi-step methods use reinforcement learning for finetuning, external optimization loops, in-context reinforcement learning, and self-reflection.

연구 동기 및 목표

프롬프트 기반 방법이 대형 언어 모델(LLMs)에서 다단계 추론을 어떻게 가능하게 하는지 평가한다.
프롬프트에서 추론 단계의 생성, 평가, 제어를 위한 분류 체계를 제공한다.
벤치마크의 진척을 요약하고 남아 있는 문제점들과 연구 의제를 식별한다.

제안 방법

생성 단계를 포함한 세 단계의 추론 파이프라인 정의: 단계 생성, 단계 평가, 그리고 추론 과정의 제어.
생성 방법을 수작업(hand-written), 외부 지식, 및 모델 생성 프롬프트로 분류한다.
자기 평가, 도구 기반 검증, 외부 비평가를 포함한 평가 전략을 조사한다.
탐욕적(greedy)에서 앙상블 및 탐색 기반 방법(BFS/DFS, RL)까지의 제어 전략을 매핑한다.
수학 워드 문제를 넘어서는 도메인 응용(코딩, 자율 에이전트)들을 검토하고 grounding에 대해 논의한다.

Figure 1: Taxonomy of LLM-Reasoning Approaches: Prompt Generation, Evaluation, and Control

실험 결과

연구 질문

RQ1도메인에 걸쳐 LLM에서 효과적인 다단계 추론을 가능하게 하는 프롬프트 기반 기술은 무엇인가?
RQ2성능과 강건성을 개선하기 위해 추론 단계의 생성, 평가, 제어를 어떻게 체계화할 수 있는가?
RQ3현재의 추론 접근 방식의 강점과 한계를 드러내는 벤치마크(예: GSM8K 및 관련 데이터셋)는 무엇인가?

주요 결과

사고의 흐름(chain-of-thought) 프롬프트가 수학 워드 문제(예: GSM8K)에서 직접 답변에 비해 상당한 성능 향상을 가져온다.
"Let’s think step by step" 같은 제로샷 프롬프트가 산술, 기호적, 논리적 작업에서 추론을 향상시킨다.
벤치마크는 난이도에 상당한 차이가 있음을 보여주고 현재의 접근 방식은 데이터셋(GSM8K, ASDiv, MAWPS, SVAMP, AQuA)에 따라 다르게 동작한다.
자동 생성 프롬프트가 여러 벤치마크에서 수작업 프롬프트와 동등하거나 더 우수하다.
다양한 추론-제어 전략(self-verification, majority voting, 도구 기반 평가, BFS/DFS, RL)이 오류 축적을 완화하는 데 도움을 준다.
추론 연구는 자기 성찰, 메타인지를 연결시키며 인공지능 일반지능으로 가는 길과 관련된다.

Figure 2: Example of input and target for supervised learning on a long addition problem of adding two numbers. The carry is recorded in the C: digit. Comments (after #) are not part of the learning target (Nye et al., 2021 )

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.