Spoken Digit Commands

A dataset consisting of recordings of spoken digits

The dataset contains 10,000 training and 1,000 test recordings of 10 classes corresponding to spoken digits from 0 to 9. The total number of speakers is 997. The dataset is a subset of the Speech Commands Dataset v0.01 released by Google. The selection has been done so that speakers in the training and test sets do not overlap.

Examples

Basic Examples

Retrieve a sample of the training dataset:

In[1]:=

Out[1]=

Retrieve a sample of the test dataset:

In[2]:=

Out[61]=

Statistics

Compute the number of examples per class:

In[62]:=

Out[63]=

Compute the total number of different speakers in the training set:

In[64]:=

Out[65]=

Inspect the sample rate and channel count of the Audio objects:

In[66]:=

Out[66]=

Plot the histogram of the durations of the Audio objects:

In[67]:=

Out[68]=

Visualization

Select an Audio object from the dataset:

In[69]:=

Out[70]=

Visualize the waveform:

In[71]:=

Out[71]=

Visualize the spectrum:

In[72]:=

Out[72]=

Visualize the spectrogram:

In[73]:=

Out[73]=

Bibliographic Citation

Wolfram Research, "Spoken Digit Commands" from the Wolfram Data Repository (2018)

License Information

Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Data Resource History

Date Created: 6 March 2018

Source Metadata

Title: Speech Commands: A Public Dataset for Single-Word Speech Recognition
Creator: Pete Warden
Date: 2017
Language: English
Source: http://download.tensorflow.org/data/speech_commands_v0.01.tar.gz

See Also

MNIST

Publisher Information

Prepared for the Wolfram Data Repository By: Wolfram Research
Publisher of Record: Wolfram Research