QUICK REVIEW

[论文解读] TextGrad: Automatic "Differentiation" via Text

Mert Yüksekgönül, Federico Bianchi|arXiv (Cornell University)|Jun 11, 2024

Natural Language Processing Techniques被引用 12

一句话总结

TextGrad 通过将来自大型语言模型的文本反馈反向传播，用于实现类似自动微分的优化，以改进由多组件组成的AI系统在代码、问答、化学和医学任务中的各个组件，且无需改变框架。

ABSTRACT

AI is undergoing a paradigm shift, with breakthroughs achieved by systems orchestrating multiple large language models (LLMs) and other complex components. As a result, developing principled and automated optimization methods for compound AI systems is one of the most important new challenges. Neural networks faced a similar challenge in its early days until backpropagation and automatic differentiation transformed the field by making optimization turn-key. Inspired by this, we introduce TextGrad, a powerful framework performing automatic ``differentiation'' via text. TextGrad backpropagates textual feedback provided by LLMs to improve individual components of a compound AI system. In our framework, LLMs provide rich, general, natural language suggestions to optimize variables in computation graphs, ranging from code snippets to molecular structures. TextGrad follows PyTorch's syntax and abstraction and is flexible and easy-to-use. It works out-of-the-box for a variety of tasks, where the users only provide the objective function without tuning components or prompts of the framework. We showcase TextGrad's effectiveness and generality across a diverse range of applications, from question answering and molecule optimization to radiotherapy treatment planning. Without modifying the framework, TextGrad improves the zero-shot accuracy of GPT-4o in Google-Proof Question Answering from $51\%$ to $55\%$, yields $20\%$ relative performance gain in optimizing LeetCode-Hard coding problem solutions, improves prompts for reasoning, designs new druglike small molecules with desirable in silico binding, and designs radiation oncology treatment plans with high specificity. TextGrad lays a foundation to accelerate the development of the next-generation of AI systems.

研究动机与目标

为由多个组件组成的复合式AI系统的端到端、基于原则的自动化优化提供动力。
提出一个框架，使用文本反馈作为梯度来更新计算图中的变量。
在包括编码、推理、化学和医学规划等多样化任务中演示 TextGrad。

提出的方法

将AI系统表示为一个计算图，将变量作为输入/输出。
定义一个梯度操作符，使用LLM提供的自然语言反馈来更新变量（文本梯度）。
使用 Textual Gradient Descent（TGD）优化器基于文本梯度更新变量。
允许目标函数为任意形式，包括自然语言描述、代码评估或仿真。
同时支持实例优化（直接优化解）和提示优化（为提升模型性能优化提示）。
提供开箱即用的实现和类似PyTorch的抽象，便于使用。

实验结果

研究问题

RQ1LLM 的文本反馈能否通过计算图反向传播，以改善复合AI系统的各个组件？
RQ2在包括编码、推理、化学和医学等多样化任务上使用 TextGrad 能带来哪些实际性能提升？
RQ3在文本梯度引导下，实例优化和提示优化有何比较？
RQ4批量、约束和动量式扩展对 TextGrad 优化有何影响？
RQ5TextGrad 是否可以在不依赖特定任务提示或广泛手工调参的情况下跨领域运行？

主要发现

任务	方法	指标	数值
LeetCode Hard	Zero-shot	Completion Rate	0.26
LeetCode Hard	Reflexion (1 demonstration, 5 iterations)	Completion Rate	0.31 ± 0.012
LeetCode Hard	TextGrad (0 demonstrations, 5 iterations)	Completion Rate	0.36 ± 0.018
GPQA (Google-proof QA)	TextGrad	Accuracy	55.0
MMLU-Machine Learning	TextGrad	Accuracy	88.4
MMLU-College Physics	TextGrad	Accuracy	95.1

改进的 LeetCode Hard 问题解法：TextGrad 在 LeetCode Hard 上实现了 36% 的完成率，且无需演示，优于 zero-shot（23%）和 Reflexion 基线（31%）。
Google-proof 问答：在 GPT-4o 上，TextGrad 将零-shot 准确率从 51% 提升到 55%。
MMLU 基准：机器学习子集准确率从 88.4%（TextGrad）对比 85.7%（CoT）有所提升；大学物理从 95.1%（TextGrad）对比 91.2%（CoT）。
放射治疗与分子设计演示表明，在通过文本梯度优化特定问题目标时有改进。
TextGrad 提供类似 PyTorch 的 API，便于跨任务的广泛可及性与通用性，无需框架级提示/调参。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。