Powered by Precision, Driven by Quality

Species Core Genome (SCG)

Subscribe To Our Newsletter

Get updates and learn from the best

Definition

The Species Core Genome (SCG) is the artificially generated genome sequence that contains the set of core genes within a species.

How to construct SCG

An SCG is constructed using the following procedure:

  1. Select all complete genomes belonging to a target species. Taxonomic identification of each genome is confirmed using the genome-based identification algorithm (Chun et al. 2018).
  2. Generate the phylogenomic phylogenetic tree using the UBCG pipeline with the maximum likelihood method. Select the representative genomes manually from the resultant tree. This process is to avoid the phylogenetic bias of the representative genomes.
  3. Calculate the set of core gene clusters or orthologous groups using the Roary pipeline. The selected core genes are considered for whole genome Multilocus Sequence Typing (wgMLST).
  4. To construct the SCG, take one representative gene from each core gene cluster and append it to the SCG. put a priority on the representative gene of the genome with more historical significance over others. For instance, genes of the famous E. coli K12 will be considered first as it has been most extensively studied. In essence, SCG is a concatenation of core genes that are of different strains. Intergenic regions are not included in the SCG.

Usage

The SCG contains the core part of the genome of a given species, and can be used for the following purposes:

1. SNP-based phylogenomic treeing
Single nucleotide polymorphism (SNP) can be calculated (=called) for any genome of the target species using a standardized way; NUCmer is highly recommended. These SNP calls from the multiple genomes are combined to generate multiple sequence alignments which then can be used for phylogenetic analyses.

2. SNP-based rapid searching against the genome database
SNP calls against the same SCG can be used for searching a query genome against the database of multiple genomes. Because call SNP calls can be precalculated prior to searching, this process can be efficient when a newly sequenced genome is searched against a large database (e.g. E. coli/Shigella group has >10,000 genomes).


Last updated on Feb. 3

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

[UBCG] programPath file

“programPath” file is a simple text file containing paths of the required external program. For example, all external programs are included in the PATH, it

MTP (Microbiome Taxonomic Profile)

MTP (Microbiome Taxonomic Profile) is a unit of data that contains the taxonomic profile of a sample. At present, we provide MTP for bacterial and

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Stay up to date

Keep up with our latest developments