Powered by Precision,
Driven by Quality

Pan-genome vs. core-genome

Subscribe To Our Newsletter

Get updates and learn from the best

Pan-genome and core-genome can be obtained from multiple genomes in a data set. Both core- and pan-genomes are collections of orthologs or orthologous groups, and in EzCgDb we call this ”

Pan-genome and core-genome can be obtained from multiple genomes in a data set. Both core- and pan-genomes are collections of orthologs or orthologous groups, and in EzCgDb we call this “Pan-genome orthologous groups (POGs)“.

Let’s consider that you have 100 genomes in a data set. The most strictly counted core-genome can be obtained by setting the cutoff at 100% (i.e., POGs that are present in all genomes). With this setting, even POGs that are present in 99 genomes will not be considered. However, in many cases, we would like to consider less stringent cutoff for the following reasons:

  • Not all genomes are completely sequenced, so some CDSs may not be included in final contigs (assemblies).
  • Gene prediction process (software that finds locations of CDSs) may miss the correct CDS
  •  For CDS products that are not really important (=house keeping), often their function/role are carried out by other CDSs that are very different in sequences. In this case, we will miss this CDSs that has different sequences as we are using a sequence-based approach in detecting orthologs (See here for more details).

A popular cutoff is at 95%, in which we can pick up orthologs that are present in 95-100 genomes. Core-genomes can be obtained at different cutoffs, and Pan-genome is actually core-genome obtained with 0% cutoff, which can be found even in a single genome.

The following chart in EzCgDb is based on 100 genomes of Acinetobacter baumannii. It highlights that the pan-genome contains more POGs without known homologs in the database (X category in the below chart), implying that accessory genes are new to us. However today, it is well known that accessory genes that are present in 1-2 genomes only are mostly from mobile genetic elements such as bacteriophages.

Updated on May 17th 2016 (EK)

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Pan-genome analysis

A pan-genome is a union of the protein-coding genes (=CDS) in a given set of genomes. Generally, building a pan-genome starts from a set of annotated

COG Colors

Clusters of Orthologous Groups (COGs) are sets of orthologous protein-coding sequences that can be used to rapidly assign functional annotations to whole genome sequences. In

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Stay up to date

Keep up with our latest developments