WDW数据集

Who-did-What Dataset

阅读理解 WDW 文本数据集自然语言处理文本分析

一个阅读理解数据集

下载所需积分：免积分下载

数据集发布者：数据集市

发布时间： 2020年11月30日

数据大小： 26 GB

查看原始数据

相关数据

Euler图学习开源数据集

Euler图学习平台自研算法对应的开源图数据与样本数据免积分下载

数据详情
文件信息

数据介绍

We have constructed a new "Who-did-What" dataset of over 200,000 fill-in-the-gap (cloze) multiple choice reading comprehension problems constructed from the LDC English Gigaword newswire corpus. The WDW dataset has a variety of novel features. First, in contrast with the CNN and Daily Mail datasets (Hermann et al., 2015) we avoid using article summaries for question formation. Instead, each problem is formed from two independent articles --- an article given as the passage to be read and a separate article on the same events used to form the question. Second, we avoid anonymization --- each choice is a person named entity. Third, the problems have been filtered to remove a fraction that are easily solved by simple baselines, while remaining 84% solvable by humans. We report performance benchmarks of standard systems and propose the WDW dataset as a challenge task for the community. ( ARTICLE HERE )

数据规格

发布时间	2011年6月17日

还没有任何文件记录.