Skip to main content
QUICK REVIEW

[论文解读] Solving Math Word Problems by Combining Language Models With Symbolic Solvers

Joy He-Yueya, Gabriel Poesia|arXiv (Cornell University)|Apr 16, 2023
Text Readability and Simplification被引用 12
一句话总结

本论文将逐步将单词题目形式化为变量和方程的 LLM 与外部符号求解器相结合,以产生逐步解,并在 GSM8k 上达到与 PAL 相当的结果,在 Algebra 数据集上实现约 20% 的绝对提升。

ABSTRACT

Automatically generating high-quality step-by-step solutions to math word problems has many applications in education. Recently, combining large language models (LLMs) with external tools to perform complex reasoning and calculation has emerged as a promising direction for solving math word problems, but prior approaches such as Program-Aided Language model (PAL) are biased towards simple procedural problems and less effective for problems that require declarative reasoning. We propose an approach that combines an LLM that can incrementally formalize word problems as a set of variables and equations with an external symbolic solver that can solve the equations. Our approach achieves comparable accuracy to the original PAL on the GSM8K benchmark of math word problems and outperforms PAL by an absolute 20% on ALGEBRA, a new dataset of more challenging word problems extracted from Algebra textbooks. Our work highlights the benefits of using declarative and incremental representations when interfacing with an external tool for solving complex math word problems. Our data and prompts are publicly available at https://github.com/joyheyueya/declarative-math-word-problem.

研究动机与目标

  • Motivate automatic generation of high-quality step-by-step solutions for math word problems.
  • address limitations of purely procedural LLM approaches (e.g., PAL) for declarative reasoning.
  • Propose a two-step approach: declarative, incremental formalization by an LLM and solving via an external symbolic solver.
  • Evaluate on GSM8k and a new Algebra-based dataset to test harder, declarative problems.
  • publicly share data and prompts for reproducibility.

提出的方法

  • Use an LLM to incrementally formalize the problem into variables and equations via a Declarative prompt.
  • Craft a Declarative prompt with principles ensuring each sentence declares a variable or an equation and that all quantities map to a single variable.
  • Append the problem and have the LLM produce a solution with interleaved natural language and formal declarations.
  • Pass the resulting system of equations to an external symbolic solver (SymPy) instead of relying on LLM arithmetic.
  • Compare multiple prompting variants including CoT, PAL, and Declarative prompts with and without SymPy.

实验结果

研究问题

  • RQ1Can incremental declarative formalization coupled with a symbolic solver match or exceed prior LLM-based methods on math word problems?
  • RQ2Does declarative prompting better handle harder, algebra-style problems than procedural methods?
  • RQ3What is the impact of incremental formalization versus one-shot or single-step formulations?
  • RQ4How does the proposed approach perform compared to PAL and CoT across GSM8k and Algebra datasets?

主要发现

方法GSM8k代数
CoT_8-shot (original)62.5±0.1645.3±0.56
CoT_3-shot (ours)58.9±0.1647.9±1.18
PAL_8-shot (original)70.2±0.2551.7±0.21
PAL_3-shot (ours)73.3±0.1356.2±0.21
Declarative_8-shot+SymPy64.7-
Declarative_3-shot+SymPy66.0±0.33-
Declarative_3-shot+principles+SymPy69.4±0.6576.3±0.93
Declarative_3-shot+principles22.4±0.27-
One-step Declarative_3-shot+SymPy57.5±0.06-
  • On GSM8k, the Declarative3-shot+principles+SymPy setup achieves comparable performance to PAL8-shot (69.4±0.65% vs 73.3±0.13%), with SymPy solving the equations.
  • On the Algebra dataset, the Declarative3-shot+principles+SymPy method outperforms PAL by an absolute 20% (76.3±0.93% vs 56.2±0.21%).
  • Using SymPy to solve equations yields much better results than asking the LLM to solve them directly (66.0±0.33% vs 22.4±0.27% for Declarative3-shot+principles).
  • Incremental declarative formalization improves performance over a one-step declarative approach (69.4±0.65% vs 57.5±0.06% in GSM8k comparisons).
  • Declarative prompting is more effective than CoT and PAL on Algebra due to the need for declarative reasoning rather than procedural steps.
  • Overall, the approach demonstrates benefits of declarative and incremental representations when interfacing with external solvers.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。