Languoid EntityStore

Source Notebook

Entity store with information about languages, dialects and families of the world (‘languoids’) and their genealogical classification

Details

The data was obtained from https://glottolog.org/ and formatted for computation.
The default content is an EntityStore containing the following properties:
"Ancestors"list of nodes between the node and its root
"Bookkeeping"true if language has been retired by other editors
"ChildDialectCount"number of dialect descendants
"ChildFamilyCount"number of family descendants
"ChildLanguageCount"number of language descendants
"Children"immediate descendants
"Coordinates"representative coordinates
"Countries"countries
"Depth"the distance between the node and its root
"Descendants"all reachable nodes in a sub-tree
"Height"number of edges on the longest path between the node and its descendants
"Id"unique identifier given by Glottolog
"ISO639P3code"iso 639-3 code
"Level"whether the languoid is a language, dialect or family
"Macroarea"an area of the globe of roughly continent size
"Name"name
"NonGenealogicalQ"true if its classification was non-genealogical
"Parent"immediate ancestor
"ParentGlottocode"parent glottocode
"Root"top node in the tree
"RootId"root glottocode
"RootQ"true for root languoids
"Siblings"list of nodes with the same parent
"SubTree"a tree consisting of the node and all its descendants
"Bookkeeping" keeps track of language-level languoids that other editors consider "languages based on misunderstanding". For more information, see https://glottolog.org/glottolog/glottologinformation#bookkeepinglanguoids.
"Coordinates" are only defined for language-level languoids. They often represent the geographical centre-point of the area where the speakers live, a historical location, the demographic centre-point or some other representative point. For more information, see https://glottolog.org/glottolog/glottologinformation#coordinates.
"Glottocode" is a string consisting of four alphanumeric characters and four decimal digits.
"Macroarea" are only defined for language-level languoids. It is the optimal division: (1) into 6 areas, (2) for which there are at least 250 languages in each area, such that (3) the distance between the component parts inside each area is minimized, and (4) the length of intersections between pairs of macro-areas is minimized. For more information, see https://glottolog.org/meta/glossary#macroarea.
The non-genealogical trees include: Sign Language, Unclassifiable, Pidgin, Unattested, Artificial Language, Mixed Language, Speech Register and Bookkeeping. For more information, see https://glottolog.org/glottolog/glottologinformation#principles.
"SubTree" returns a graph formed by the glottocodes of the languoids.

Examples

Sample entities

In[1]:=
{Entity["Languoid", "anci1242"], Entity["Languoid", "bibl1238"], Entity["Languoid", "caja1240"], Entity["Languoid", "indo1319"], Entity["Languoid", "japa1258"], Entity["Languoid", "mand1415"], Entity["Languoid", "sign1238"]}

Basic Examples (4) 

Retrieve the EntityStore and register it:

In[2]:=
EntityRegister[ResourceData[\!\(\*
TagBox["\"\<Languoid EntityStore\>\"",
#& ,
BoxID -> "ResourceTag-Languoid EntityStore-Input",
AutoDelete->True]\)]]
Out[2]=

Number of "languoids" (families, languages and dialects):

In[3]:=
EntityValue["Languoid", "EntityCount"]
Out[3]=
In[4]:=
EntityProperties["Languoid"]
Out[4]=

Find a property value for an entity:

In[5]:=
Entity["Languoid", "nucl1643"]["Children"]
Out[5]=

Retrieve a dataset of all available properties for an entity:

In[6]:=
Entity["Languoid", "anci1242"]["Dataset"]
Out[6]=

Properties (5) 

Get the genealogical classification of Biblical Hebrew:

In[7]:=
Entity["Languoid", "bibl1238"]["Ancestors"]
Out[7]=

Find the children of the Indo-European language family:

In[8]:=
Entity["Languoid", "indo1319"]["Children"]
Out[8]=

Filter only the language-level languoids:

In[9]:=
Entity["Languoid", "indo1319"]["Children", "Level" -> "Language"]
Out[9]=

Get all languages who share the same parent as Mandarin Chinese:

In[10]:=
Entity["Languoid", "mand1415"][
EntityProperty["Languoid", "Siblings", {"Level" -> "Language"}]]
Out[10]=

The sub-tree of a languoid is a tree consisting of the entity (as a root node) and all its descendants:

In[11]:=
Entity["Languoid", "caja1240"]["SubTree"] // LayeredGraphPlot[#, VertexLabels -> Placed["Name", Center, Entity["Languoid", #] &]] &
Out[11]=

Entity Classes (3) 

List all pre-defined entity classes:

In[12]:=
EntityClassList["Languoid"]
Out[12]=
In[13]:=
EntityValue[EntityClass["Languoid", "RootLanguages"], "EntityCount"]
Out[13]=

List all non-genealogical root families and their number of children:

In[14]:=
Length /@ EntityValue[EntityClass["Languoid", "NonGenealogicalRootFamilies"], "Children", "EntityAssociation"]
Out[14]=

Create an implicitly defined entity class consisting of all dialects within the Australia macro-area:

In[15]:=
EntityClass[
   "Languoid", {EntityProperty["Languoid", "Macroarea"] -> "Australia", EntityProperty["Languoid", "Level"] -> "Dialect"}] //
   EntityList // Shallow
Out[15]=

Visualizations (2) 

Find the shortest path between 2 nodes with the same root:

In[16]:=
latinshortestgraph[node1_] := Module[{tree, spf},
   tree = node1["Root"]["SubTree"];
   spf = FindShortestPath[UndirectedGraph@tree, node1["Id"], All];
   Subgraph[tree, spf[#]] &
   ];
In[17]:=
latinshortestgraph[Entity["Languoid", "lati1261"]][
  Entity["Languoid", "stan1295"]["Id"]] // LayeredGraphPlot[#, VertexLabels -> Placed["Name", Center, Entity["Languoid", #] &], AspectRatio -> .75, ImageSize -> Large] &
Out[17]=

Highlight the ancestors and the sub-tree of a node:

In[18]:=
With[{tree = Entity["Languoid", "sanm1306"]["Root"]["SubTree"], id = Entity["Languoid", "sanm1306"]["Id"]}, HighlightGraph[tree, Subgraph[tree, Flatten@Through[{VertexInComponent, VertexOutComponent}[tree, id]]], GraphLayout -> "RadialEmbedding"]
 ]
Out[18]=

Analysis (3) 

Top 10 root families with highest (node) degree:

In[19]:=
BarChart[Reverse@
  TakeLargest[
   EntityValue[EntityClass["Languoid", "RootFamilies"], "Degree", "EntityAssociation"], 10], ChartLabels -> Automatic, BarOrigin -> Left]
Out[19]=

A plot with the top 10 countries in Eurasia with the highest number of languages:

In[20]:=
EntityValue[
     EntityClass[
      "Languoid", {"Level" -> "Language", "Macroarea" -> "Eurasia"}], "Countries", "EntityAssociation"] // KeyValueMap[Thread[#2 -> #1] &] // GroupBy[Flatten@#, First -> Last, Length] & // TakeLargest[10] //
 GeoRegionValuePlot
Out[20]=

Glottolog does not always consider a languoid that has an iso 639-3 code a language:

In[21]:=
GroupBy[Normal@
   EntityValue[EntityClass["Languoid", "ISOLanguages"], "Level", "EntityAssociation"], Last -> First, Length] // PieChart[#, ChartLabels -> Placed[Automatic, "VerticalCallout"], PlotLabel -> Row[{"\[NumberSign] of iso 639-3 languages: ", Total[#]}]] &
Out[21]=

Daniel Sanchez, "Languoid EntityStore" from the Wolfram Data Repository (2021)  

Data Resource History

Source Metadata

See Also

Data Downloads

Publisher Information