Sample Data for Query Book

Source Notebook

Data to support the Wolfram Media book Query: Getting Information from Data with the Wolfram Language

Details

The various content elements of this Dataset are used in the book by Seth J. Chandler, Query: Getting Data From Information With the Wolfram Language. They span three basic categories: data on Titanic passengers, data on Major League Soccer players, data on Solar System planets, and some miscellaneous data.

(4 columns, 1309 rows)

Examples

Basic Examples (1) 

Retrieve the default content element, which is a Dataset on the Titanic:

In[1]:=
ResourceData[\!\(\*
TagBox["\"\<Sample Data for Query Book\>\"",
#& ,
BoxID -> "ResourceTag-Sample Data for Query Book-Input",
AutoDelete->True]\)]
Out[1]=

Scope & Additional Elements (6) 

Retrieve the content relating to the Dinghy Association:

In[2]:=
ResourceData["Sample Data for Query Book", "aAssociation"]
Out[2]=
In[3]:=
ResourceData["Sample Data for Query Book", "Dinghy Association of Associations"]
Out[3]=
In[4]:=
ResourceData["Sample Data for Query Book", "Dinghy Association of Associations Dataset"]
Out[4]=
In[5]:=
ResourceData["Sample Data for Query Book", "Dinghy List of Associations"]
Out[5]=
In[6]:=
ResourceData["Sample Data for Query Book", "Dinghy List of Associations Dataset"]
Out[6]=

Retrieve the content relating to the Titanic:

In[7]:=
ResourceData["Sample Data for Query Book", "Lifeboat List of Associations"] // ResourceFunction[
ResourceObject[<|"Name" -> "AugmentedTerse", "ShortName" -> "AugmentedTerse", "UUID" -> "55ad4cc5-e284-40ca-a3cb-8a4166c38701", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "An operator form of Short with an alternative compressed representation of the output", "RepositoryLocation" -> URL[
      "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$779c3314c408433eb6df4354526edb23`AugmentedTerse", "FunctionLocation" -> CloudObject[
      "https://www.wolframcloud.com/obj/5aa416ea-2adb-41c1-9102-ddaa33f49612"]|>, ResourceSystemBase -> Automatic]][5]
Out[7]=
In[8]:=
ResourceData["Sample Data for Query Book", "Lifeboat List of Associations Dataset"] // ResourceFunction[
ResourceObject[<|"Name" -> "FormatDataset", "ShortName" -> "FormatDataset", "UUID" -> "76670bca-1587-4e7e-9e89-5b698a30759d", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Format a dataset using a given set of option values", "RepositoryLocation" -> URL[
      "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$66a3086203b4405b88cdb0de8a5c3128`FormatDataset", "FunctionLocation" -> CloudObject[
      "https://www.wolframcloud.com/obj/70389ad6-7dbc-48c8-b898-72c65c00f14e"]|>, ResourceSystemBase -> Automatic]][MaxItems -> 10]
Out[8]=
In[9]:=
ResourceData["Sample Data for Query Book", "Titanic Cabins 2 List of Associations"]
Out[9]=
In[10]:=
ResourceData["Sample Data for Query Book", "Titanic Cabins 2 List of Associations Dataset"]
Out[10]=
In[11]:=
ResourceData["Sample Data for Query Book", "Titanic Cabins List of Associations"]
Out[11]=
In[12]:=
ResourceData["Sample Data for Query Book", "Titanic Cabins List of Associations Dataset"]
Out[12]=
In[13]:=
ResourceData["Sample Data for Query Book", "Titanic List of Associations"][[1 ;; 20]]
Out[13]=
In[14]:=
ResourceData["Sample Data for Query Book", "Titanic List of Associations Dataset"]
Out[14]=

Retrieve the content relating to Major League Soccer:

In[15]:=
ResourceData["Sample Data for Query Book", "MLS List of Associations"][[1 ;; 5]]
Out[15]=
In[16]:=
ResourceData["Sample Data for Query Book", "MLS List of Associations Dataset"]
Out[16]=
In[17]:=
ResourceData["Sample Data for Query Book", "MLS List of Lists"][[
 1 ;; 5]]
Out[17]=
In[18]:=
ResourceData["Sample Data for Query Book", "MLS List of Lists Dataset"]
Out[18]=

Retrieve the content relating to planets:

In[19]:=
ResourceData["Sample Data for Query Book", "Planets Deeply Nested Structure"] // ResourceFunction[
ResourceObject[<|"Name" -> "Terse", "ShortName" -> "Terse", "UUID" -> "6809487c-44ed-4a55-a610-ab706ebb8661", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "An operator form of Short", "RepositoryLocation" -> URL[
      "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$369a78f89aa2413eb5b19a962ce89cd7`Terse", "FunctionLocation" -> CloudObject[
      "https://www.wolframcloud.com/obj/c1820918-b759-4685-b9b8-c971a81216b5"]|>, ResourceSystemBase -> Automatic]][5]
Out[19]=
In[20]:=
ResourceData["Sample Data for Query Book", "Planets Deeply Nested Structure Dataset"]
Out[20]=
In[21]:=
ResourceData["Sample Data for Query Book", "Mars Association"]
Out[21]=

Retrieve the content relating to Eastern cities:

In[22]:=
ResourceData["Sample Data for Query Book", "Eastern Cities List"]
Out[22]=
In[23]:=
ResourceData["Sample Data for Query Book", "Pops List"]
Out[23]=

Retrieve the content relating to IDWeight:

In[24]:=
ResourceData["Sample Data for Query Book", "IDWeight List of Associations"]
Out[24]=

Visualizations (4) 

In[25]:=
planets = ResourceData["Sample Data for Query Book", "Planets Deeply Nested Structure Dataset"];

Create a stack of cylinders showing the approximate circumference of all of the planetary moons, coloring them according to the planet they circle:

In[26]:=
Query[Values/*
   MapIndexed[{planet, index} |-> {Directive[{Opacity[0.5], ColorData[63][index[[1]]]}], planet}]/*(Graphics3D[#, BoxRatios -> {1, 1, 1}] &), "Moons", Values/*MapIndexed[{r, i} |-> Cylinder[{{0, 0, i[[1]] - 1}, {0, 0, i[[1]]}}, QuantityMagnitude@r]], #Radius &][planets]
Out[26]=

In[27]:=
mls = ResourceData["Sample Data for Query Book", "MLS List of Associations Dataset"];

Create a graphic showing the (log) salary trajectories of players on the Houston Dynamo:

In[28]:=
Query[Select[#Club == "HOU" &]/*
   GroupBy[{#LastName, #FirstName} &]/*(DateListPlot[#, PlotRange -> All, ScalingFunctions -> "Log"] &), All, {#Year, #GuaranteedCompensation} &][mls]
Out[28]=

Break down comparative survival on the Titanic by cabin class and sex:

In[29]:=
titanic = ResourceData["Sample Data for Query Book", "Titanic List of Associations"];
In[30]:=
Query[GroupBy[#class &], GroupBy[#sex &]/*KeySort, ResourceFunction[
ResourceObject[<|"Name" -> "Proportions", "ShortName" -> "Proportions", "UUID" -> "4ef6d7aa-b945-488a-9528-82adf386af1d", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Get the proportion of times that each distinct element appears in a list", "RepositoryLocation" -> URL[
        "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$4b828263fd9449ce8c2264e3c1724652`Proportions", "FunctionLocation" -> CloudObject[
        "https://www.wolframcloud.com/objects/18f09ed0-9c4a-4609-a89b-634efdd1ad42"]|>, ResourceSystemBase -> Automatic]]/*KeyDrop[False]/*
    Values/*First/*N, #survived &][titanic] // Dataset
Out[30]=

Break down comparative survival on the Titanic by cabin class and sex and age decade; lump together people age 50 and over:

In[31]:=
styleDataset = (Dataset[#, ItemDisplayFunction -> (If[NumericQ[#], Round[#, 0.01], "-"] &), MaxItems -> {All, All, All}, ItemStyle -> 10, HeaderStyle -> 10, ItemSize -> 4, Background -> {Automatic, {ColorData["Pastel"][0.1], ColorData["Pastel"][0.9]}}] &);
In[32]:=
Query[GroupBy[#class &], GroupBy[#sex &]/*KeySort, GroupBy[ToString[Quotient[Min[#age, 50], 10]] /. "5" -> "5+" &]/*
    KeySort, ResourceFunction[
ResourceObject[<|"Name" -> "Proportions", "ShortName" -> "Proportions", "UUID" -> "4ef6d7aa-b945-488a-9528-82adf386af1d", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Get the proportion of times that each distinct element appears in a list", "RepositoryLocation" -> URL[
        "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$4b828263fd9449ce8c2264e3c1724652`Proportions", "FunctionLocation" -> CloudObject[
        "https://www.wolframcloud.com/objects/18f09ed0-9c4a-4609-a89b-634efdd1ad42"]|>, ResourceSystemBase -> Automatic]]/*KeyDrop[False]/*
    Values/*First/*N, #survived &][titanic] // styleDataset
Out[32]=

Get the Wolfram Knowledgebase Entities corresponding with several eastern European cities, deleting any that WolframAlpha does not recognize:

In[33]:=
cityEntities = DeleteCases[
  Map[WolframAlpha[#, "Result"] &, ResourceData["Sample Data for Query Book", "Eastern Cities List"]],
   Null]
Out[33]=

Show the cities on a map:

In[34]:=
GeoListPlot[cityEntities, PlotMarkers -> GeoMarker]
Out[34]=

Show the cities on a relief map of eastern Europe, representing them as bubbles that correspond to their population:

In[35]:=
GeoBubbleChart[ResourceFunction[
ResourceObject[<|"Name" -> "PairMap", "ShortName" -> "PairMap", "UUID" -> "87f7da87-fdae-4ad1-b9d2-29f9d7301b45", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Map a function to pairs formed from a list and another function", "RepositoryLocation" -> URL[
      "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$5102f3194eac409286d2e9feed3857c1`PairMap", "FunctionLocation" -> CloudObject[
      "https://www.wolframcloud.com/obj/cc05e3d5-ab17-4abb-8fc5-250eb95b9ae6"]|>, ResourceSystemBase -> Automatic]][#["Population"] &,
   cityEntities, Rule], GeoRange -> EntityClass["Country", "EasternEurope"], GeoRangePadding -> Quantity[500, "Kilometers"], GeoBackground -> GeoStyling["ReliefMap"]]
Out[35]=

Analysis (4) 

Compute the mean mass of the moons of Mars:

In[36]:=
Query["Mars", "Moons", Mean, #Mass &][
 ResourceData["Sample Data for Query Book", "Planets Deeply Nested Structure Dataset"]]
Out[36]=

Compute the median guaranteed compensation of MLS players by club in 2017; restrict the output to 10 rows and color the data light green:

In[37]:=
ResourceFunction[
ResourceObject[<|"Name" -> "DatasetQuery", "ShortName" -> "DatasetQuery", "UUID" -> "d0240069-0a8f-4db5-a5d5-59d0387d154f", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "A version of Query that maintains the options of a Dataset", "RepositoryLocation" -> URL[
      "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$11cea570bf924ec8b429cafc7ec45d64`DatasetQuery", "FunctionLocation" -> CloudObject[
      "https://www.wolframcloud.com/obj/d92d0c48-e878-4004-bf82-a5418de72a7d"]|>, ResourceSystemBase -> Automatic]][
  Select[#Year == DateObject[{2017}] &]/*GroupBy[#Club &]/*
   Query[ReverseSort], Median, #"GuaranteedCompensation" &, "Inheritance" -> <|"Additions" -> {MaxItems -> 10, Background -> Lighter@Green}|>][
 ResourceData["Sample Data for Query Book", "MLS List of Associations Dataset"]]
Out[37]=

Join the "cabins" data with the passenger data to obtain a single Dataset that includes the square footage of the cabin occupied by each passenger and whether they had a window:

In[38]:=
JoinAcross[
  ResourceData["Sample Data for Query Book", "Titanic List of Associations"], ResourceData["Sample Data for Query Book", "Titanic Cabins List of Associations"], "class"] // Dataset/*Query[All, KeyDrop["decklocations"]]/*ResourceFunction[
ResourceObject[<|"Name" -> "FormatDataset", "ShortName" -> "FormatDataset", "UUID" -> "76670bca-1587-4e7e-9e89-5b698a30759d", "ResourceType" -> "Function", "Version" -> "1.0.0", "Description" -> "Format a dataset using a given set of option values", "RepositoryLocation" -> URL[
       "https://www.wolframcloud.com/obj/resourcesystem/api/1.0"], "SymbolName" -> "FunctionRepository`$66a3086203b4405b88cdb0de8a5c3128`FormatDataset", "FunctionLocation" -> CloudObject[
       "https://www.wolframcloud.com/obj/70389ad6-7dbc-48c8-b898-72c65c00f14e"]|>, ResourceSystemBase -> Automatic]][MaxItems -> 10]
Out[38]=

Use machine learning to create a model of the probability of survival on the Titanic:

Split the data into training and testing:

In[39]:=
titanic = ResourceData["Sample Data for Query Book", "Titanic List of Associations"];
In[40]:=
{training, test} = (SeedRandom[20221012]; TakeDrop[RandomSample[titanic], Round[0.7*Length[titanic]]])
Out[40]=

To prepare the data for classification, group the data by survival and drop the survived column:

In[41]:=
cl = Query[
   GroupBy[#survived &]/*
    Map[KeyDrop[
      "survived"]]/*(Classify[#, TrainingProgressReporting -> None] &)][training]
Out[41]=

Assess classifier performance:

In[42]:=
cmo = ClassifierMeasurements[cl, test -> "survived"]
Out[42]=

Seth J. Chandler, "Sample Data for Query Book" from the Wolfram Data Repository (2023)  

License Information

MIT License

Data Resource History

Source Metadata

See Also

Data Downloads

Publisher Information