Protein sequences of the SARS-CoV-2 virus (the virus associated with the COVID-19 disease, formerly known as 2019-nCoV) including location, collection time and similar supporting data. (This data was imported and made computable at 6 am CST on February 25, 2021.)
This data is imported from the National Center for Biotechnology Information (NCBI) and formatted for computation.
Properties provided with each sequence include: "Accession", "Length", "Authors", "Publications", "GeographicLocation", "DetailedGeographicLocation", "USState", "Host", "Sequence", "CollectionDate", "ReleaseDate", "InclusionDate", "GenBankTitle", "Protein", "SequenceType", "ProteinStatus", "IsolationSource" and "BioSample".
Most of these protein sequences are collected from humans, but not all:
Scope & Additional Elements
Get a date plot of collection dates:
See a data histogram of release dates:
See a timeline plot of inclusion dates:
Show the locations where the sequences were gathered:
Most of the provided protein sequences come from the United States and Australia:
When we look at the geographic locations providing protein sequences with the most common title, “surface glycoprotein”, these proportions are largely maintained:
Most of the provided sequences come from regions in the Unites States and Australia:
When we look at the detailed geographic locations providing protein sequences with the most common title, “surface glycoprotein”, these proportions are again largely maintained:
By gathering all of the titles by their protein label, we can see that the same proteins are submitted under a wide variety of names:
We can plot where these proteins are found along the reference SARS-CoV-2 genome. To properly find an alignment, we align the protein reference sequences with the translation of each potential frame shift and choose the best alignment:
"Protein Sequences for the SARS-CoV-2 Coronavirus"
from the Wolfram Data Repository
Data Resource History
Updated: 25 February 2021
Title: Severe acute respiratory syndrome coronavirus 2 data hub: Search, retrieve, and analyze SARS-CoV-2 GenBank data.
Creator: National Center for Biotechnology Information