[Paper Review] DRCD: a Chinese Machine Reading Comprehension Dataset
The paper introduces DRCD, a traditional Chinese MRC dataset with 10,014 paragraphs from 2,108 Wikipedia articles and 30k+ questions, plus a baseline F1 of 89.59% and human F1 of 93.30%.
In this paper, we introduce DRCD (Delta Reading Comprehension Dataset), an open domain traditional Chinese machine reading comprehension (MRC) dataset. This dataset aimed to be a standard Chinese machine reading comprehension dataset, which can be a source dataset in transfer learning. The dataset contains 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions generated by annotators. We build a baseline model that achieves an F1 score of 89.59%. F1 score of Human performance is 93.30%.
Motivation & Objective
- Provide a standard Chinese machine reading comprehension dataset for transfer learning.
- Offer a large, open-domain traditional Chinese MRC resource for benchmarking.
- Enable evaluation of models on Chinese MRC with a realistic paragraph and question mix.
Proposed method
- Collect and annotate traditional Chinese MRC data from open-domain sources.
- Assemble 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions.
- Establish a baseline model to benchmark F1 and compare with human performance.
- Report baseline F1 score of 89.59% and human performance of 93.30% on the dataset.
Experimental results
Research questions
- RQ1How well can a baseline model perform on DRCD for traditional Chinese MRC?
- RQ2What is the gap between model performance and human performance on DRCD?
- RQ3Can DRCD serve as an effective source dataset for transfer learning in Chinese MRC?
- RQ4What are the characteristics of the DRCD dataset in terms of size and source diversity?
Key findings
- Baseline model achieves an F1 score of 89.59%.
- Human performance on DRCD yields an F1 score of 93.30%.
- The dataset comprises 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions.
- DRCD serves as an open-domain traditional Chinese MRC resource suitable for benchmarking and transfer learning.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.