Alpha Missense

Source Notebook

Categorization of the pathogenicity of 89% of 71 million possible human missense variants

Details

This data is developed by AlphaMissense model.
The default content is a Dataset containing a summary of the chromosomes information with the following properties for each chromosome:
"Chromosome"string including "Chromosome" and a number from 1 to 22 or a letter M,X,or Y.
"RawDataSize"total rows for each chromosome in the original database
"TotalPositions"total positions for each chromosome
"TotalMutations"total missense mutations for each chromosome
"Genome"the genome build
"TotalUniprotID"total uniprot IDs
"TotalTranscriptID"total transcript IDs
"TotalAminoAcidVariations"total amino acid changes
"PathogenicityQuartiles"pathogenicity Quartiles 1/4, 2/4, 3/4
"PathogenicityMean"pathogenicity Mean
"PathogenicityLikelyBenign"percentage of "likely_benign" classifications
"PathogenicityLikelyPathogenic"percentage of "likely_pathogenic" classifications
"PathogenicityAmbiguous"percentage of "ambiguous" classifications
Additional content elements are the original database for each chromosome:
"Chromosome"string including "Chromosome" and a number from 1 to 22 or a letter M,X,or Y.
"Position"genome position (1-based)
"ReferenceNucleotide"reference nucleotide (GRCh38.p13 for hg38)
"AlternativeNucleotide"alternative nucleotide
"Genome"genome build
"UniprotID"UniProtKB accession number of the protein in which the variant induces a single amino-acid substitution (UniProt release 2021_02)
"TranscriptID"Ensembl transcript ID from GENCODE V32 (hg38)
"AminoAcidVariation"Amino acid change induced by the alternative allele,in the format: Reference aminoacid-POS_aa-Alternative amino acid
"Pathogenicity"predicted probability of a variant being clinically pathogenic
"Classification"derived using the following thresholds: "likely_benign" for Pathogenicity < 0.34; "likely_pathogenic" for Pathogenicity > 0.564; and "ambiguous" otherwise

Examples

Basic Examples (5) 

In[1]:=
ResourceData[\!\(\*
TagBox["\"\<Alpha Missense\>\"",
#& ,
BoxID -> "ResourceTag-Alpha Missense-Input",
AutoDelete->True]\)][[;; 4]]
Out[1]=

Retrieve the data for ChromosomeX:

In[2]:=
chrX = ResourceData[\!\(\*
TagBox["\"\<Alpha Missense\>\"",
#& ,
BoxID -> "ResourceTag-Alpha Missense-Input",
AutoDelete->True]\), "ChromosomeX"];
chrX[[;; 4]]
Out[21]=

Get a random sample of the Positions in ChromosomeX:

In[22]:=
RandomSample[Union@Normal@chrX[All, #Position &], 5]
Out[22]=

Get a random sample of the transcriptIDs in ChromosomeX:

In[23]:=
RandomSample[Union@Normal@chrX[All, #TranscriptID &], 5]
Out[23]=

Get a distribution of the ChromosomeX pathogenicity level for all possible missense variations:

In[24]:=
BoxWhiskerChart[chrX[All, #Pathogenicity &], "Mean", PlotLabel -> "Pathogenicy level distribution", FrameLabel -> {"ChromosomeX", "Pathogenicity level"}]
Out[24]=

Get a summary of the classification of the ChromosomeX pathogenicity level for all possible missense variations:

In[25]:=
PieChart[#[[All, 2]], ChartLegends -> #[[All, 1]]] &@
 Tally[Normal@chrX[All, #Classification &]]
Out[25]=

Scope & Additional Elements (2) 

Get the pathogenicity information associated to ChromosomeX in position 71765227:

In[26]:=
PathogenicityData[chromosome_String, mutationPosition_Integer, "Pathogenicity"] := Block[{rawData0},
  rawData0 = ResourceData[\!\(\*
TagBox["\"\<Alpha Missense\>\"",
#& ,
BoxID -> "ResourceTag-Alpha Missense-Input",
AutoDelete->True]\), chromosome][
    Select[MatchQ[#[[2]], mutationPosition] &]];
  Dataset[
   Association[{"Position" -> #[[2]], "Mutation" -> {#[[3]] -> #[[4]]}, "TranscriptID" -> #[[7]], "AminoAcidVariation" -> #[[8]], "Pathogenicity" -> #[[9]], "Classification" -> #[[10]]}] & /@ Normal[rawData0]]
  ]
In[27]:=
PathogenicityData["ChromosomeX", 71765227, "Pathogenicity"]
Out[27]=

Get the pathogenicity information associated to ChromosomeX for a list of positions:

In[28]:=
PathogenicityData[chromosome_String, mutationPosition_List, "Pathogenicity"] := Block[{rawData0},
  rawData0 = ResourceData[\!\(\*
TagBox["\"\<Alpha Missense\>\"",
#& ,
BoxID -> "ResourceTag-Alpha Missense-Input",
AutoDelete->True]\), chromosome][
    Select[MemberQ[mutationPosition, #[[2]]] &]];
  Dataset[
   Association[{"Position" -> #[[2]], "Mutation" -> {#[[3]] -> #[[4]]}, "TranscriptID" -> #[[7]], "AminoAcidVariation" -> #[[8]], "Pathogenicity" -> #[[9]], "Classification" -> #[[10]]}] & /@ Normal[rawData0]]
  ]
In[29]:=
PathogenicityData["ChromosomeX", {20193531, 20175194, 19372662, 31444512, 24179596}, "Pathogenicity"]
Out[29]=

Visualizations (1) 

Get the pathogenicity distribution associated to all the positions of all the chromosomes:

In[30]:=
data = Table[Normal[ResourceData[\!\(\*
TagBox["\"\<Alpha Missense\>\"",
#& ,
BoxID -> "ResourceTag-Alpha Missense-Input",
AutoDelete->True]\), "Chromosome" <> ToString[i]][
     All, #Pathogenicity &]], {i, {Sequence @@ Range[1, 22], "M", "X",
      "Y"}}];
In[31]:=
BoxWhiskerChart[data, "Mean", ChartLabels -> {Sequence @@ Table["chr" <> ToString[i], {i, 1, 22, 1}], "chrM", "chrX", "chrY"}, PlotLabel -> "Pathogenicy level distribution", FrameLabel -> {"", "Pathogenicity level"}]
Out[31]=

Wolfram Research, "Alpha Missense" from the Wolfram Data Repository (2024)  

Data Resource History

Source Metadata

Publisher Information