Wolfram Research

Swadesh Lists

Word lists for common concepts in nearly 1200 languages

Japanese-English Legal Parallel Corpus

A parallel corpus for machine translation systems, information extraction and other language processing techniques

Europarl English-Spanish Machine Translation Dataset V7

A parallel corpus for machine translation from the proceedings of the European Parliament

Europarl English-German Machine Translation Dataset V7

A parallel corpus for machine translation from the proceedings of the European Parliament

Europarl English-Italian Machine Translation Dataset V7

A parallel corpus for machine translation from the proceedings of the European Parliament

Europarl English-French Machine Translation Dataset V7

A parallel corpus for machine translation from the proceedings of the European Parliament

FDIC Institution EntityStore

A Wolfram Language EntityStore with selected data on FDIC insured institutions

Scripps National Spelling Bee Champions

Spelling Bee winners, final words, and sponsoring organizations

Geotagged Public Tweets (Europe, April 6-8 2016)

Public Twitter statuses

Atlantic Hurricane Data 1851-2017

A modification of the NOAA "Hurdat2" Dataset on Atlantic Hurricanes to facilitate use with the Wolfram Language

Minecraft Block Types

Wolfram Language EntityStore with IDs and sample images for 150+ types of Minecraft blocks

Spoken Digit Commands

A dataset consisting of recordings of spoken digits

Kyoto Free Translation Task Data

A parallel corpus for the evaluation and development of Japanese-English machine translation systems

Irish-Viking Networks in 'Cogadh Gaedhel re Gallaibh'

Graph datasets for Irish and Viking character relationships in the medieval Irish text 'Cogadh Gaedhel re Gallaibh' ('The War of the Gaedhil with the Gaill')

SQuAD v1.1 Tokens Generated with WL

A list of isolated words and symbols from the SQuAD dataset, which consists of a set of Wikipedia articles labeled for question answering and reading comprehension

SQuAD v2.0 Tokens Generated with WL

A list of isolated words and symbols from the SQuAD dataset, which consists of a set of Wikipedia articles labeled for question answering and reading comprehension

SQuAD v1.1

A dataset for question answering and reading comprehension from a set of Wikipedia articles

Clinical Concepts from Massive Sources of Medical Data

A dataset of medical concepts

SQuAD v2.0

A dataset for question answering and reading comprehension from a set of Wikipedia articles

United States Supreme Court Decisions 1946-present

Datasets relating to Supreme Court cases from 1946 to present