Powered by Precision, Driven by Quality

Gene frequency plot in pan-genome

Subscribe To Our Newsletter

Get updates and learn from the best

All potential orthologous protein-coding  genes (=CDSs) are clustered into non-redundant gene sets after pan-genome calculation to generate “Pan-genome Orthologous Groups (POGs)”.

Obviously, a core part of the genome containing essential or house-keeping genes are found within all genomes, and less important genes are found less frequently. Some genes are detected only in a single genome. A “gene frequency plot” gives a general overview of the frequency of genes within a whole genome set. A typical plot will show a U-shaped plot where most genes are detected either as all genomes or a single genome.

In the following example, 31 Vibrio vulnificus genomes were analyzed from which 13,220 non-redundant genes were found from a total of 144,931 genes. Please note that the below figure shows the number of genes in log-scale.

The below figure visualizes the same data as the above except that the genes are classified into 4 major functional categories, and the number of genes that are displayed are NOT in log-scale.

Here are more pan-genome examples of other species. Can you tell how and why the shapes of the charts are different?

  • Chlamydia psittaci (obligate parasitic bacterium with small genome)

  • Corynebacterium diphtheriae (Pathogenic bacterium belonging to the phylum Actinobacteria)

This type of figures has been used in many publications including:

  1. Lefebure, T. & Stanhope, M.J. Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol 8, R71 (2007).
  2. Touchon, M. et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 5, e1000344 (2009).

Updated on April 28th, 2016 (EK)

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

EzBioCloud Apps

EzBioCloud offers the following  applications for microbial genomics: 16S-based ID This service was previously found under the “Identify” menu. Input data is a single 16S

16S rRNA and 16S rRNA Gene

Overview 16S rRNA stands for 16S ribosomal ribonucleic acid (rRNA), where S (Svedberg) is a unit of measurement (sedimentation rate). This rRNA is an important

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Stay up to date

Keep up with our latest developments