[论文解读] Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems
Ape210K 提供一个大规模中文应用题数据集(210K 道题)具有丰富的模板(56K),以及一个复制增强的 seq2seq 基线;它揭示基于 Math23K 的模型和基准在向接近人类水平的表现方面的局限性。
Automatic math word problem solving has attracted growing attention in recent years. The evaluation datasets used by previous works have serious limitations in terms of scale and diversity. In this paper, we release a new large-scale and template-rich math word problem dataset named Ape210K. It consists of 210K Chinese elementary school-level math problems, which is 9 times the size of the largest public dataset Math23K. Each problem contains both the gold answer and the equations needed to derive the answer. Ape210K is also of greater diversity with 56K templates, which is 25 times more than Math23K. Our analysis shows that solving Ape210K requires not only natural language understanding but also commonsense knowledge. We expect Ape210K to be a benchmark for math word problem solving systems. Experiments indicate that state-of-the-art models on the Math23K dataset perform poorly on Ape210K. We propose a copy-augmented and feature-enriched sequence to sequence (seq2seq) model, which outperforms existing models by 3.2% on the Math23K dataset and serves as a strong baseline of the Ape210K dataset. The gap is still significant between human and our baseline model, calling for further research efforts. We make Ape210K dataset publicly available at https://github.com/yuantiku/ape210k
研究动机与目标
- 激发对比现有基准之外的更大、更多样化的数学应用题数据集的需求。
- 提供一个拥有正确答案和推导方程的数据集,以提升解题技术。
- 通过大量模板展示多样性并分析所需的常识知识。
提出的方法
- 提出一种复制增强和特征丰富的序列到序列(seq2seq)模型。
- 证明该模型在 Math23K 上比现有模型提高 3.2%。
- 将 Ape210K 作为基准进行评估并分析当前模型与人类表现之间的差距。
实验结果
研究问题
- RQ1更大、模板丰富的数据集能否提升数学应用题求解模型?
- RQ2模型在解决 Ape210K 问题时是否需要自然语言理解和常识知识两者?
- RQ3与 Math23K 基线相比,复制增强的 seq2seq 模型在大规模 MWP 基准上的表现如何?
主要发现
- Ape210K 包含 210K 道题,是公开数据集 Math23K 最大规模的九倍。
- Ape210K 包含 56K 个模板,比 Math23K 多 25 倍。
- 解决 Ape210K 需要语言理解和超越 Math23K 能力的常识知识。
- 所提出的复制增强、特征丰富的 seq2seq 模型比 Math23K 基线提高 3.2%。
- 目前的模型与人类在 Ape210K 上仍存在显著差距。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。