LLMBenchmarks Data

Results from the Wolfram LLM Benchmarking Project

Examples

Basic Examples (1)

Obtain the benchmark data:

In[1]:=

$ResourceData[\!$\* TagBox["\"\<LLMBenchmarks Data\>\"", #& , BoxID -> "ResourceTag-LLMBenchmarks Data-Input", AutoDelete->True]$]$

Out[2]=

Visualizations (1)

Display a bar chart with the top 10 models:

In[3]:=

$ResourceData[\!$\* TagBox["\"\<LLMBenchmarks Data\>\"", #& , BoxID -> "ResourceTag-LLMBenchmarks Data-Input", AutoDelete->True]$][ BarChart[Reverse@Take[#, 10], BarOrigin -> Left] &, Labeled[#CorrectFunctionality, #Model] &]$

Out[3]=

Analysis (4)

Get the top three models by code generation correctness:

In[4]:=

$Query[TakeLargestBy["CorrectFunctionality", 3], "Model"][ ResourceData[\!$\* TagBox["\"\<LLMBenchmarks Data\>\"", #& , BoxID -> "ResourceTag-LLMBenchmarks Data-Input", AutoDelete->True]$]]$

Out[4]=

Select all models from Meta:

In[5]:=

$Query[Select[#Vendor == "Meta" &], "Model"][ResourceData[\!$\* TagBox["\"\<LLMBenchmarks Data\>\"", #& , BoxID -> "ResourceTag-LLMBenchmarks Data-Input", AutoDelete->True]$]]$

Out[5]=

Select the top model for each vendor:

In[6]:=

$ResourceData[\!$\* TagBox["\"\<LLMBenchmarks Data\>\"", #& , BoxID -> "ResourceTag-LLMBenchmarks Data-Input", AutoDelete->True]$][GroupBy["Vendor"], TakeLargestBy["CorrectFunctionality", 1], "Model"]$

Out[6]=

Sort the vendors by their average model score on generating valid Wolfram Language syntax:

In[7]:=

$ResourceData[\!$\* TagBox["\"\<LLMBenchmarks Data\>\"", #& , BoxID -> "ResourceTag-LLMBenchmarks Data-Input", AutoDelete->True]$][ ReverseSort@GroupBy[#, #Vendor & -> (#CorrectSyntax &), Mean] &]$

Out[7]=

External Links

https://www.wolfram.com/llm-benchmarking-project/

Bibliographic Citation

Wolfram Research, "LLMBenchmarks Data" from the Wolfram Data Repository (2025)

Data Resource History

Date Created: 9 July 2024

Source Metadata

Citation: Source, reference or citation information

Data Downloads

JSON
WL

Publisher Information

Prepared for the Wolfram Data Repository By: Wolfram Research
Publisher of Record: Wolfram Research