Powered by Precision,
Driven by Quality

Subscribe To Our Newsletter

Get updates and learn from the best

Definition

The Species Core Genome (SCG) is the artificially generated genome sequence that contains the set of core genes within a species.

How to construct SCG

An SCG is constructed using the following procedure:

  1. Select all complete genomes belonging to a target species. Taxonomic identification of each genome is confirmed using the genome-based identification algorithm (Chun et al. 2018).
  2. Generate the phylogenomic phylogenetic tree using the UBCG pipeline with the maximum likelihood method. Select the representative genomes manually from the resultant tree. This process is to avoid the phylogenetic bias of the representative genomes.
  3. Calculate the set of core gene clusters or orthologous groups using the Roary pipeline. The selected core genes are considered for whole genome Multilocus Sequence Typing (wgMLST).
  4. To construct the SCG, take one representative gene from each core gene cluster and append it to the SCG. put a priority on the representative gene of the genome with more historical significance over others. For instance, genes of the famous E. coli K12 will be considered first as it has been most extensively studied. In essence, SCG is a concatenation of core genes that are of different strains. Intergenic regions are not included in the SCG.

Usage

The SCG contains the core part of the genome of a given species, and can be used for the following purposes:

1. SNP-based phylogenomic treeing
Single nucleotide polymorphism (SNP) can be calculated (=called) for any genome of the target species using a standardized way; NUCmer is highly recommended. These SNP calls from the multiple genomes are combined to generate multiple sequence alignments which then can be used for phylogenetic analyses.

2. SNP-based rapid searching against the genome database
SNP calls against the same SCG can be used for searching a query genome against the database of multiple genomes. Because call SNP calls can be precalculated prior to searching, this process can be efficient when a newly sequenced genome is searched against a large database (e.g. E. coli/Shigella group has >10,000 genomes).


Last updated on Feb. 3

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

16S rRNA and 16S rRNA Gene

Overview 16S rRNA stands for 16S ribosomal ribonucleic acid (rRNA), where S (Svedberg) is a unit of measurement (sedimentation rate). This rRNA is an important

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Stay up to date

Keep up with our latest developments