QUICK REVIEW

[논문 리뷰] TextGrad: Automatic "Differentiation" via Text

Mert Yüksekgönül, Federico Bianchi|arXiv (Cornell University)|2024. 06. 11.

Natural Language Processing Techniques인용 수 12

한 줄 요약

TextGrad는 LLM으로부터의 텍스트 피드백을 역전파하여 코드, QA, 화학 및 의학 작업에 걸친 복합 AI 시스템의 구성 요소를 개선하면서 프레임워크를 바꾸지 않는 자동 미분과 유사한 최적화를 가능하게 합니다.

ABSTRACT

AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, developing principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic differentiation transformed the field by making optimization turn-key. Inspired by this, we introduce TextGrad, a powerful framework performing automatic ``differentiation'' via text. TextGrad backpropagates textual feedback provided by LLMs to improve individual components of a compound AI system. In our framework, LLMs provide rich, general, natural language suggestions to optimize variables in computation graphs, ranging from code snippets to molecular structures. TextGrad follows PyTorch's syntax and abstraction and is flexible and easy-to-use. It works out-of-the-box for a variety of tasks, where the users only provide the objective function without tuning components or prompts of the framework. We showcase TextGrad's effectiveness and generality across a diverse range of applications, from question answering and molecule optimization to radiotherapy treatment planning. Without modifying the framework, TextGrad improves the zero-shot accuracy of GPT-4o in Google-Proof Question Answering from $51\%$ to $55\%$, yields $20\%$ relative performance gain in optimizing LeetCode-Hard coding problem solutions, improves prompts for reasoning, designs new druglike small molecules with desirable in silico binding, and designs radiation oncology treatment plans with high specificity. TextGrad lays a foundation to accelerate the development of the next-generation of AI systems.

연구 동기 및 목표

다중 구성요소로 구성된 복합 AI 시스템에 대한 체계적인 자동화 최적화를 촉진한다.
계산 그래프의 변수 업데이트에 대해 텍스트 피드백을 그래디언트로 사용한다는 프레임워크를 제시한다.
코딩, 추론, 화학, 의학 계획을 포함한 다양한 작업에서 TextGrad를 시연한다.

제안 방법

변수를 입력/출력으로 하는 계산 그래프로 AI 시스템을 표현한다.
LLM이 제공하는 자연어 피드백을 사용하여 변수를 업데이트하는 그래디언트 연산자(텍스트 그래디언트)를 정의한다.
텍스트 그래디언트를 기반으로 변수를 업데이트하는 Textual Gradient Descent (TGD) 옵티마이저를 사용한다.
목적 함수는 자연어 설명, 코드 평가, 시뮬레이션 등 임의로 허용된다.
인스턴스 최적화(해답을 직접 최적화)와 프롬프트 최적화(모델 성능 향상을 위한 프롬프트 최적화) 모두를 지원한다.
사용의 편의를 위한 즉시 사용 가능한 구현 및 PyTorch 유사 추상화를 제공한다.

실험 결과

연구 질문

RQ1LLM의 텍스트 피드백이 계산 그래프를 역전파하여 복합 AI 시스템의 개별 구성 요소를 개선할 수 있는가?
RQ2코딩, 추론, 화학, 의학 등 다양한 작업에서 TextGrad를 사용한 실질적 성능 향상은 어느 정도인가?
RQ3텍스트 그래디언트로 안내될 때 인스턴스 최적화와 프롬프트 최적화는 어떻게 비교되는가?
RQ4배치, 제약 조건, 모멘텀 스타일 확장이 TextGrad 최적화에 미치는 효과는 무엇인가?
RQ5도메인 전반에 걸쳐 작업-specific한 프롬pts나 광범위한 핸드 튜닝 없이도 TextGrad가 작동할 수 있는가?

주요 결과

작업	방법	지표	값
LeetCode Hard	Zero-shot	Completion Rate	0.26
LeetCode Hard	Reflexion (1 demonstration, 5 iterations)	Completion Rate	0.31 ± 0.012
LeetCode Hard	TextGrad (0 demonstrations, 5 iterations)	Completion Rate	0.36 ± 0.018
GPQA (Google-proof QA)	TextGrad	Accuracy	55.0
MMLU-Machine Learning	TextGrad	Accuracy	88.4
MMLU-College Physics	TextGrad	Accuracy	95.1

Improved LeetCode Hard problem solutions: TextGrad achieves 36% completion on LeetCode Hard without demonstrations, outperforming zero-shot (23%) and Reflexion baselines (31%).
Google-proof Question Answering: zero-shot accuracy improved from 51% to 55% with TextGrad on GPT-4o.
MMLU benchmarks: Machine Learning subset accuracy improved from 88.4% (TextGrad) vs. 85.7% (CoT); College Physics improved from 95.1% (TextGrad) vs. 91.2% (CoT).
Radiotherapy and molecular design demonstrations show improvements when optimizing problem-specific objectives via textual gradients.
TextGrad provides a PyTorch-like API, enabling broad accessibility and generality across tasks without framework-level prompts/tuning.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.