A large-scale dataset of 44k natural language processing problems, inspired by the original Winograd Schema Challenge design

Examples

Basic Examples (3) 

Get the WinoGrande dataset:

In[1]:=
data = ResourceData[\!\(\*
TagBox["\"\<WinoGrande\>\"",
#& ,
BoxID -> "ResourceTag-WinoGrande-Input",
AutoDelete->True]\)]
Out[1]=

A sample row:

In[2]:=
RandomChoice[Normal[data]]
Out[2]=

Number of items in the dataset:

In[3]:=
Length[data]
Out[3]=

Get a random WinoGrande problem:

In[4]:=
q = Normal[RandomChoice[ResourceData[\!\(\*
TagBox["\"\<WinoGrande\>\"",
#& ,
BoxID -> "ResourceTag-WinoGrande-Input",
AutoDelete->True]\)]]]
Out[4]=

Test an LLM with the problem:

In[5]:=
LLMFunction[
  "In the sentence below, which option does _ correspond to? Reply only with one of the specified options and nothing else.

`Sentence`

Available options:
`Options`"][q]
Out[5]=

Verify:

In[6]:=
q["Answer"]
Out[6]=

Get a random sample of problems:

In[7]:=
problems = RandomSample[ResourceData[\!\(\*
TagBox["\"\<WinoGrande\>\"",
#& ,
BoxID -> "ResourceTag-WinoGrande-Input",
AutoDelete->True]\)], 10] // Normal;

Each sentence contains a "_" which is a blank that's meant to be filled in:

In[8]:=
Lookup[problems, "Sentence"]
Out[8]=

Each problem gives a set of multiple choice options:

In[9]:=
Lookup[problems, "Options"]
Out[9]=

The correct answer:

In[10]:=
Lookup[problems, "Answer"]
Out[10]=

Scope & Additional Elements (2) 

Get a larger version of the WinoGrande dataset:

In[11]:=
data = ResourceData[\!\(\*
TagBox["\"\<WinoGrande\>\"",
#& ,
BoxID -> "ResourceTag-WinoGrande-Input",
AutoDelete->True]\), "TrainingDatasetExtraLarge"]
Out[11]=
In[12]:=
Length[data]
Out[12]=

Get a test version of the WinoGrande dataset:

In[13]:=
ResourceData[\!\(\*
TagBox["\"\<WinoGrande\>\"",
#& ,
BoxID -> "ResourceTag-WinoGrande-Input",
AutoDelete->True]\), "TestDataset"]
Out[13]=

Analysis (5) 

Get a random sample of WinoGrande questions:

In[14]:=
q = Normal[RandomSample[ResourceData[\!\(\*
TagBox["\"\<WinoGrande\>\"",
#& ,
BoxID -> "ResourceTag-WinoGrande-Input",
AutoDelete->True]\)], 100]];
In[15]:=
text = StringTemplate[
    "In the sentence below, which option does _ correspond to? Reply only with one of the specified options and nothing else.

`Sentence`

Available options:
`Options`"] /@ q;
In[16]:=
correct = q[[All, "Answer"]]
Out[16]=

Check results using an older LLM:

In[17]:=
answers1 = LLMSynthesize[#, LLMEvaluator -> <|"Model" -> {"OpenAI", "gpt-3.5-turbo"}|>] & /@ text;
In[18]:=
MapThread[SameQ, {answers1, correct}] // Counts
Out[18]=

Compare with a more modern model:

In[19]:=
answers2 = LLMSynthesize[#, LLMEvaluator -> <|"Model" -> {"OpenAI", "gpt-4o"}|>] & /@ text
Out[19]=

Much better performance:

In[20]:=
MapThread[SameQ, {answers2, correct}] // Counts
Out[20]=

View a table comparing a sample of results:

In[21]:=
Style[TableForm[
  RandomSample[
   Transpose[{MapThread[
      If[#1 === #2, "✅ " <> #1, "❌ " <> #1] &, {answers1, correct}], MapThread[
      If[#1 === #2, "✅ " <> #1, "❌ " <> #1] &, {answers2, correct}], q[[All, "Sentence"]]}], 10], TableHeadings -> {None, {"GPT-3.5", "GPT-4o", "Sentence"}}], "Text",
  FontSize -> 12]
Out[21]=

Wolfram Research, "WinoGrande" from the Wolfram Data Repository (2024)  

License Information

CC-BY

Data Resource History

Source Metadata

Publisher Information