Wolfram Data Repository
Immediate Computable Access to Curated Contributed Data
Splice-junction Gene Sequences for Primate DNA
Splice junctions are points on a DNA sequence at which "superfluous" DNA is removed during the process of protein creation in higher organisms. The problem posed in this dataset is to recognize, given a sequence of DNA, the boundaries between exons (the parts of the DNA sequence retained after splicing) and introns (the parts of the DNA sequence that are spliced out). In the biological community, intron/exon borders are referred to a "acceptors" while exon/intron borders are referred to as "donors".
Retrieve the resource:
In[1]:= |
Out[1]= |
Retrieve the default content:
In[2]:= |
Out[2]= |
Shuffle the dataset randomly:
In[3]:= |
Out[3]= |
Create a training dataset using 80% of the original dataset:
In[4]:= |
Out[4]= |
Create a testing dataset using the remaining 20% of the original dataset:
In[5]:= |
Out[5]= |
Train a classifier:
In[6]:= |
Out[6]= |
Obtain general information about the classifier:
In[7]:= |
Out[7]= |
Generate a ClassifierMeasurementsObject of the classifier with the test set:
In[8]:= |
Out[8]= |
Visualize the accuracy of the classifier:
In[9]:= |
Out[9]= |
Wolfram Research, "Sample Data: Gene Sequences" from the Wolfram Data Repository (2018)
Creative Commons Public Domain Mark