A dataset for question answering and reading comprehension from a set of Wikipedia articles
The Stanford Question Answering Dataset (SQuAD) consists of questions posed by crowd workers on a set of Wikipedia articles where the answer to every question is a segment of text, or span, from the corresponding reading passage. Unanswerable questions were added to the dataset for v2.0.
The "ContentElements" field contains eight options: "Dataset", "TrainingData", "ValidationData", "TrainingMetadata", "ValidationMetadata", "Data", "ColumnNames" and "ColumnDescriptions". "Dataset" contains the full dataset. Please note that data marked "Validation" in the ValidationRole field can have multiple possible answers for each question. "TrainingData" and "ValidationData" are formatted for standard question answering usage; for every question, only the first answer of the full dataset is selected. "TrainingMetadata" and "ValidationMetadata" contain the title of the Wikipedia article to which each question ID corresponds. "Data" contains the full dataset structured as an association. "ColumnNames" and "ColumnDescriptions" provide more information about the columns of the dataset.
Modifications from the original dataset: Data marked "Training" in the ValidationRole field corresponds to the Training Set v2.0 subset of the original dataset. Data marked "Validation" in the ValidationRole field corresponds to the Dev Set v2.0 subset of the original dataset. The original dataset is 0-indexed; in order be accurate in the Wolfram Language, 1 was added to the value of "AnswerPosition", as the Wolfram Language is 1-indexed.