Spoken Digit Commands

A dataset consisting of recordings of spoken digits

The dataset contains 10,000 training and 1,000 test recordings of 10 classes corresponding to spoken digits from 0 to 9. The total number of speakers is 997. The dataset is a subset of the Speech Commands Dataset v0.01 released by Google. The selection has been done so that speakers in the training and test sets do not overlap.

Examples

Basic Examples

Retrieve a sample of the training dataset:

In[1]:=
RandomSample[ResourceData["Spoken Digit Commands"], 5]
Out[1]=

Retrieve a sample of the test dataset:

In[2]:=
RandomSample[
 ResourceData["Spoken Digit Commands", All]["TestDataset"], 5]
Out[61]=

Statistics

Compute the number of examples per class:

In[62]:=
ResourceData["Spoken Digit Commands"][Counts, "Output"]
Out[63]=

Compute the total number of different speakers in the training set:

In[64]:=
Length[ResourceData["Spoken Digit Commands"][DeleteDuplicates, "SpeakerID"]]
Out[65]=

Inspect the sample rate and channel count of the Audio objects:

In[66]:=
ResourceData["Spoken Digit Commands"][{Counts@*AudioSampleRate, Counts@*AudioChannels}, "Input"]
Out[66]=

Plot the histogram of the durations of the Audio objects:

In[67]:=
ResourceData["Spoken Digit Commands"][
 Histogram[#, ScalingFunctions -> "Log"] &@*Duration, "Input"]
Out[68]=

Visualization

Select an Audio object from the dataset:

In[69]:=
a = RandomChoice[ResourceData["Spoken Digit Commands"]]["Input"]
Out[70]=

Visualize the waveform:

In[71]:=
AudioPlot[a]
Out[71]=

Visualize the spectrum:

In[72]:=
Periodogram[a]
Out[72]=

Visualize the spectrogram:

In[73]:=
Spectrogram[a]
Out[73]=

Wolfram Research, "Spoken Digit Commands" from the Wolfram Data Repository (2018)  

License Information

Creative Commons Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)

Data Resource History

Source Metadata

See Also

Publisher Information