Human Protein Atlas

Source Notebook

Data from The Human Protein Atlas

Details

The data is based on The Human Protein Atlas version 23.0 and Ensembl version 109.
Each dataset in the UMAP coordinates for cells is an Association of the form <|cluster id1 {{cell id11, {UMAP_x11, UMAP_y11}}, {cell id12, {UMAP_x12, UMAP_y12}}, , {cell id1N, {UMAP_x1N, UMAP_y1N}}}, , cluster idM {{cell idM1, {UMAP_xM1, UMAP_yM1}}, {cell idM2, {UMAP_xM2, UMAP_yM2}}, , {cell idMK, {UMAP_xMK, UMAP_yMK}}}|>, where M, N, K are positive integers.
Uniform Manifold Approximation and Projection (UMAP) is a method for reducing the dimensionality of a data set (Becht E et al. (2018))
Gene and protein expression levels for different datasets are expressed as transcripts per million ("TPM"), protein-transcripts per million ("pTPM") and normalized expression ("nTPM").
The default content is a Association containing a the expression levels (nTPM) of genes in different human tissues along with these additional data:
"TissueAtlas Gene co-expression network"graph of co-expressing genes in tissues
"TissueAtlas maximum expression location"maximum expression location of genes
"BrainAtlas gene expression (TPM)"expression levels (TPM) of genes in human brain
"BrainAtlas gene expression (pTPM)"expression levels (pTPM) of genes in human brain
"BrainAtlas gene expression (nTPM)"expression levels (nTPM) of genes in human brain
"BrainAtlas Gene co-expression network"graph of co-expressing genes in human brain
"PathologyAtlas"data about roles of genes in different cancers
"SingleCellAtlas expression (nTPM)"expression levels (nTPM) of genes in different cell types
"SingleCellAtlas cell clusters"description of cell clusters
"SingleCellAtlas expression in cell clusters(nTPM)"expression levels (nTPM) of genes in different cell types and clusters
"SingleCellAtlas UMAP coordinates in tissue "<>tissueUMAP coordinates for cells in clusters for different tissues
"SubCellularAtlas"expression of genes in different subcellular regions
"Ensembl ID gene name association"Association of Emsembl IDs of genes and common gene names
"Ensembl ID gene description UniProtID association"Association of Emsembl IDs of genes to gene description and UniProtID
"Organ tissue association"Association of organs and tissues belonging to an organ
tissue for UMAP coordinates can be adipose_tissue, bone_marrow, brain, breast, bronchus, colon, endometrium, esophagus, eye, fallopian_tube, heart_muscle, kidney, liver, lung, lymph_node, ovary, pancreas, pbmc, placenta, prostate, rectum, salivary_gland, skeletal_muscle, skin, small_intestine, spleen, stomach, testis, thymus, tongue and vascular.

(20163 elements)

Examples

Basic Examples (2) 

Location (tissues) of maximum RNA expression:

In[1]:=
Dataset[ResourceData[\!\(\*
TagBox["\"\<Human Protein Atlas\>\"",
#& ,
BoxID -> "ResourceTag-Human Protein Atlas-Input",
AutoDelete->True]\), "TissueAtlas maximum expression location"]]
Out[1]=

Expression of 50 strongest expressing genes in human tissues:

In[2]:=
RNAExpressionData = ResourceData[\!\(\*
TagBox["\"\<Human Protein Atlas\>\"",
#& ,
BoxID -> "ResourceTag-Human Protein Atlas-Input",
AutoDelete->True]\)];
tissues = RNAExpressionData[[1]];
ensemblIDGeneAssoc = ResourceData[\!\(\*
TagBox["\"\<Human Protein Atlas\>\"",
#& ,
BoxID -> "ResourceTag-Human Protein Atlas-Input",
AutoDelete->True]\), "Ensembl ID gene name association"];
ensemblIDs = Keys@RNAExpressionData[[2 ;; -1]];
genes = ensemblIDGeneAssoc /@ ensemblIDs;
v = Values@RNAExpressionData;
totExp = Table[Total@v[[i]] /. {"NA" -> 0}, {i, 2, Length@genes + 1}];
sv = Sort[totExp];
numGenes = 50;
pos = Flatten@
   Map[Position[totExp, #] &, sv[[Length@genes - numGenes ;; Length@genes]]];
highestExpressionGenes = ensemblIDs[[pos]];
highestExpressionGeneNames = Map[ensemblIDGeneAssoc, highestExpressionGenes];
ftx = MapThread[{#1, #2} &, {Range@Length@highestExpressionGeneNames, highestExpressionGeneNames}];
fty = MapThread[{#1, Rotate[Capitalize@#2, Pi/2]} &, {Range@
     Length@tissues, tissues}];
array = v[[pos + 1]];
mA = {Min@array /. "NA" -> 0, Max@array /. "NA" -> 0};
ArrayPlot[array, ColorFunction -> "BlueGreenYellow", FrameTicks -> {ftx, fty}, ImageSize -> 1000, Frame -> True, FrameStyle -> Directive[Black, 12], PlotLegends -> BarLegend[{"BlueGreenYellow", mA}], PlotLabel -> Style["Highest expressing genes in human tissues (nPTM)", Black, Bold, 20]]
Out[18]=

Visualizations (5) 

Subgraph containing 20 chosen genes from the co-expression network:

In[19]:=
RNACoExpressionNetwork = ResourceData[\!\(\*
TagBox["\"\<Human Protein Atlas\>\"",
#& ,
BoxID -> "ResourceTag-Human Protein Atlas-Input",
AutoDelete->True]\), "TissueAtlas Gene co-expression network"];
networkGenes = DeleteDuplicates@VertexList[RNACoExpressionNetwork];
geneList = {"IFT140", "GJB7", "ZBTB16", "ZP4", "OR11G2", "CLDN14", "KLHL28", "ZNF557", "ZNF546", "C4orf3", "SPOP", "PGF", "LRRC45", "CSDE1", "CUX2", "GOT2", "TGM5", "CDO1", "STOML1", "JSRP1"};
subGraphPattern = Alternatives @@ Join[Map[UndirectedEdge[_, #] &, geneList], Map[UndirectedEdge[#, _] &, Echo@geneList]];
graph = DeleteDuplicates@
   Cases[RNACoExpressionNetwork, subGraphPattern];
Graph[graph, GraphLayout -> "GravityEmbedding"]
Out[20]=

Expression patterns of randomly chosen pair of connected genes in the co-expression network:

In[21]:=
RNAExpressionData = ResourceData[\!\(\*
TagBox["\"\<Human Protein Atlas\>\"",
#& ,
BoxID -> "ResourceTag-Human Protein Atlas-Input",
AutoDelete->True]\)];
RNACoExpressionNetwork = ResourceData[\!\(\*
TagBox["\"\<Human Protein Atlas\>\"",
#& ,
BoxID -> "ResourceTag-Human Protein Atlas-Input",
AutoDelete->True]\), "TissueAtlas Gene co-expression network"];
RNAExpressionData = ResourceData[\!\(\*
TagBox["\"\<Human Protein Atlas\>\"",
#& ,
BoxID -> "ResourceTag-Human Protein Atlas-Input",
AutoDelete->True]\)];
ensemblIDGeneAssoc = ResourceData[\!\(\*
TagBox["\"\<Human Protein Atlas\>\"",
#& ,
BoxID -> "ResourceTag-Human Protein Atlas-Input",
AutoDelete->True]\), "Ensembl ID gene name association"];
ensemblIDs = Keys@RNAExpressionData[[2 ;; -1]];
genes = ensemblIDGeneAssoc /@ ensemblIDs;
genePairs = Echo@VertexList[Graph@RandomChoice[RNACoExpressionNetwork]];
array = (RNAExpressionData[#]/Max[RNAExpressionData[#]] &) /@ ensemblIDs[[Flatten[Position[genes, #] & /@ genePairs]]];
ListLinePlot[array, PlotLegends -> {genePairs[[1]], genePairs[[2]]}, PlotRange -> All]
Out[29]=

Correlation of expression:

In[30]:=
Correlation @@ array
Out[30]=

Cell cluster plot:

In[31]:=
CellClusterPlot[organ_] := Module[{assoc, clusters, L, color, clusterList},
  assoc = ResourceData[\!\(\*
TagBox["\"\<Human Protein Atlas\>\"",
#& ,
BoxID -> "ResourceTag-Human Protein Atlas-Input",
AutoDelete->True]\), "SingleCellAtlas UMAP coordinates in tissue " <> organ];
  clusters = Keys@assoc;
  clusterList = StringJoin["Cluster ", ToString@#] & /@ clusters;
  L = Length@clusters;
  color = ColorData["Rainbow"][#] & /@ (Range[L]/L);
  Legended[
   ListPlot[Table[Style[assoc[[i, All, -1]], color[[i]]], {i, L}], Axes -> False, FrameLabel -> {"UMAP_x", "UMAP_y"}, FrameStyle -> Directive[Black, Thickness[0.0025], 15], AspectRatio -> 1, PlotLabel -> Style[Capitalize@organ, Black, 18],
     Frame -> True, PlotStyle -> PointSize[Small]], SwatchLegend[color, clusterList, LegendMarkers -> "Bubble"]]]
In[32]:=
CellClusterPlot["brain"]
Out[32]=

Cluster plot for all tissues:

In[33]:=
CellClusterPlotBasic[organ_] := Module[{assoc, clusters, L, color, clusterList},
  assoc = ResourceData[\!\(\*
TagBox["\"\<Human Protein Atlas\>\"",
#& ,
BoxID -> "ResourceTag-Human Protein Atlas-Input",
AutoDelete->True]\), "SingleCellAtlas UMAP coordinates in tissue " <> organ];
  clusters = Keys@assoc;
  clusterList = StringJoin["Cluster ", ToString@#] & /@ clusters;
  L = Length@clusters;
  color = ColorData["Rainbow"][#] & /@ (Range[L]/L);
  ListPlot[Table[Style[assoc[[i, All, -1]], color[[i]]], {i, L}], Axes -> False, FrameLabel -> {"UMAP_x", "UMAP_y"}, FrameStyle -> Directive[Black, Thickness[0.0025], 15], AspectRatio -> 1, PlotLabel -> Style[Capitalize@organ, Black, 18], Frame -> True, PlotStyle -> PointSize[Small]]]
In[34]:=
tissueList = {"adipose_tissue", "bone_marrow", "brain", "breast", "bronchus", "colon", "endometrium", "esophagus", "eye", "fallopian_tube", "heart_muscle", "kidney", "liver", "lung", "lymph_node", "ovary", "pancreas", "pbmc", "placenta", "prostate", "rectum", "salivary_gland", "skeletal_muscle", "skin", "small_intestine", "spleen", "stomach", "testis", "thymus", "tongue", "vascular"};
In[35]:=
plots = CellClusterPlotBasic[#] & /@ tissueList;
In[36]:=
GraphicsGrid[Partition[plots, UpTo@5], ImageSize -> 1500]
Out[36]=

Analysis (1) 

Power law dependence of VertexDegree in the human gene co-expression network:

In[37]:=
RNACoExpressionNetwork = ResourceData[\!\(\*
TagBox["\"\<Human Protein Atlas\>\"",
#& ,
BoxID -> "ResourceTag-Human Protein Atlas-Input",
AutoDelete->True]\), "TissueAtlas Gene co-expression network"];
vd = VertexDegree@RNACoExpressionNetwork;
sortedVDAssoc = Association[SortBy[Normal@Counts[vd], First -> Last]];
power = -1/2;
power2 = -1;
buff = 500;
buff2 = 4000;
vals = {#, buff*#^power} & /@ (Keys@sortedVDAssoc);
vals2 = {#, buff2*#^power2} & /@ (Keys@sortedVDAssoc);
ListLogLogPlot[{sortedVDAssoc, vals, vals2}, PlotRange -> All, ImageSize -> Large, FrameLabel -> {"VertexDegree", "Frequency"}, AspectRatio -> 2/3, Frame -> True, FrameStyle -> Directive[Black, Thickness[0.002], 15], PlotLegends -> {"VertexDegree", "~VertexDegree^{-1/2}", "~VertexDegree^{-1}"}]
Out[38]=

WolframChemistry, "Human Protein Atlas" from the Wolfram Data Repository (2024)  

Data Resource History

Source Metadata

See Also

Data Downloads

Publisher Information