Powered by Precision,
Driven by Quality

Gene frequency plot in pan-genome

Subscribe To Our Newsletter

Get updates and learn from the best

All potential orthologous protein-coding  genes (=CDSs) are clustered into non-redundant gene sets after pan-genome calculation to generate “Pan-genome Orthologous Groups (POGs)”.

Obviously, a core part of the genome containing essential or house-keeping genes are found within all genomes, and less important genes are found less frequently. Some genes are detected only in a single genome. A “gene frequency plot” gives a general overview of the frequency of genes within a whole genome set. A typical plot will show a U-shaped plot where most genes are detected either as all genomes or a single genome.

In the following example, 31 Vibrio vulnificus genomes were analyzed from which 13,220 non-redundant genes were found from a total of 144,931 genes. Please note that the below figure shows the number of genes in log-scale.

The below figure visualizes the same data as the above except that the genes are classified into 4 major functional categories, and the number of genes that are displayed are NOT in log-scale.

Here are more pan-genome examples of other species. Can you tell how and why the shapes of the charts are different?

  • Chlamydia psittaci (obligate parasitic bacterium with small genome)

  • Corynebacterium diphtheriae (Pathogenic bacterium belonging to the phylum Actinobacteria)

This type of figures has been used in many publications including:

  1. Lefebure, T. & Stanhope, M.J. Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol 8, R71 (2007).
  2. Touchon, M. et al. Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS Genet 5, e1000344 (2009).

Updated on April 28th, 2016 (EK)

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Tetra-Nucleotide Analysis (TNA)

A tetra-nucleotide is a fragment of DNA sequence with 4 bases (e.g. AGTC or TTGG). Pride et al. (2003) showed that the frequency of tetra-nucleotides

Pan-genome vs. core-genome

Pan-genome and core-genome can be obtained from multiple genomes in a data set. Both core- and pan-genomes are collections of orthologs or orthologous groups, and

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Stay up to date

Keep up with our latest developments