Powered by Precision,
Driven by Quality

How to calculate 16S rRNA sequence similarity values for bacterial taxonomy: Why BLAST should be avoided

Subscribe To Our Newsletter

Get updates and learn from the best

Nucleotide sequence similarity values are widely used for identification and description of novel species among bacterial taxonomists. There are many different algorithms available for calculating a similarity between two gene sequences, and often times it is easy to misinterpret the results. Below, is the method for obtaining nucleotide sequence similarity values for taxonomic purposes.

The calculation of sequence similarity between two genes consists of two steps:

(i) pairwise sequence alignment and
(ii) calculation of similarity value. Pairwise sequence alignment can be achieved either by using the global or local alignment algorithms. It is recommended to use the global alignment algorithm and avoid using the local alignment algorithm (Please see here for details). That’s why a BLAST-series program should NOT be used for calculating similarities. Even though, BLAST is still the best tool for identifying the most similar sequences within a large database of sequences.

In the EzBioCloud server, the closest neighboring taxa are first identified using the BLASTN program, and then a rigorous pairwise sequence alignment algorithm (Myers & Miller, 1988) is used to calculate sequence similarity. When sequence similarity is calculated, gaps are not considered. Using pairwise sequence alignment instead of multiple sequence similarity ensures that reproducibility of the similarity calculation. For example, if you obtain the sequence similarity between A and B from a pairwise sequence alignment, the value will always be the same. However, the values between A and B calculated using multiple sequence alignments among A, B, and C and A, B, and D respectively, may be different as the multiple sequence alignment algorithm tries to find the optimal solution among all sequences, not just between A and B.

These recommendations are also described in the following publications:

1. Kim, M., Oh, H.S., Park, S.C. & Chun, J. Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol 64, 346-51 (2014).
2. Kim, O.S. et al. Introducing EzTaxon-e: a prokaryotic 16S rRNA gene sequence database with phylotypes that represent uncultured species. Int J Syst Evol Microbiol 62, 716-21 (2012).
3. Tindall, B.J., Rossello-Mora, R., Busse, H.J., Ludwig, W. & Kampfer, P. Notes on the characterization of prokaryote strains for taxonomic purposes. Int J Syst Evol Microbiol 60, 249-66 (2010).

By Jon Jongsik Chun (CEO of CJ Bioscience, Inc. & Professor at Seoul National Univ.)

Updated on April 4th 2016.

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Genomes in this data set

This table contains the following information (only selected columns are explained): Title of column Description  Download CLG CLG is a data file with a special

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Stay up to date

Keep up with our latest developments