QUICK REVIEW

[논문 리뷰] Ape210K: A Large-Scale and Template-Rich Dataset of Math Word Problems

Wei Zhao, Mingyue Shang|arXiv (Cornell University)|2020. 09. 24.

Topic Modeling참고 문헌 26인용 수 35

한 줄 요약

Ape210K는 대규모 중국어 수학 단어 문제 데이터셋(210K 문제)과 풍부한 템플릿(56K)을 제공하고, 복사-강화된 seq2seq 베이스라인을 제시한다; 이는 Math23K 기반 모델과 사람 수준 성능을 향한 벤치마크의 한계를 드러낸다.

ABSTRACT

Automatic math word problem solving has attracted growing attention in recent years. The evaluation datasets used by previous works have serious limitations in terms of scale and diversity. In this paper, we release a new large-scale and template-rich math word problem dataset named Ape210K. It consists of 210K Chinese elementary school-level math problems, which is 9 times the size of the largest public dataset Math23K. Each problem contains both the gold answer and the equations needed to derive the answer. Ape210K is also of greater diversity with 56K templates, which is 25 times more than Math23K. Our analysis shows that solving Ape210K requires not only natural language understanding but also commonsense knowledge. We expect Ape210K to be a benchmark for math word problem solving systems. Experiments indicate that state-of-the-art models on the Math23K dataset perform poorly on Ape210K. We propose a copy-augmented and feature-enriched sequence to sequence (seq2seq) model, which outperforms existing models by 3.2% on the Math23K dataset and serves as a strong baseline of the Ape210K dataset. The gap is still significant between human and our baseline model, calling for further research efforts. We make Ape210K dataset publicly available at https://github.com/yuantiku/ape210k

연구 동기 및 목표

기존 벤치마크를 넘는 더 크고 다양해진 수학 단어 문제 데이터셋의 필요성에 대한 동기 부여.
정답 골드와 유도 방정식을 포함한 데이터셋을 제공하여 해법 기술을 개선.
대규모 템플릿 수를 통한 다양성 시연 및 필요 상식 지식 분석.

제안 방법

복사-강화 및 특징이 풍부한 시퀀스-투-시퀀스(seq2seq) 모델을 제안.
모델이 Math23K에서 기존 모델보다 3.2% 향상하는지 보여준다.
Ape210K를 벤치마크로 평가하고 현재 모델과 인간 성능 간의 격차를 분석한다.

실험 결과

연구 질문

RQ1더 크고 템플릿이 풍부한 데이터셋이 Math Word Problem 해결 모델을 향상시킬 수 있는가?
RQ2모델이 Ape210K 문제를 해결하기 위해 자연어 이해와 상식 지식을 모두 필요로 하는가?
RQ3복사-강화 seq2seq 모델은 대규모 MWP 벤치마크에서 Math23K 벤치마크와 비교해 어떤 성능을 보이는가?

주요 결과

Ape210K에는 210K 문제가 포함되어 있으며, 가장 큰 공개 데이터셋 Math23K의 9배 크기이다.
Ape210K에는 56K 템플릿이 포함되어 있으며, Math23K의 25배 더 많다.
Ape210K를 해결하는 것은 Math23K의 능력을 넘는 언어 이해와 상식 지식 모두를 요구한다.
제안된 복사-강화되고 특징이 풍부한 seq2seq 모델은 Math23K 벤치마크보다 3.2% 높다.
현재 모델과 인간 성능 간의 격차는 Ape210K에서 여전히 크다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.