Powered by Precision,
Driven by Quality

Completeness of 16S rRNA gene sequence

Subscribe To Our Newsletter

Get updates and learn from the best

For the purposes of all necessary bioinformatics calculations, a complete 16S rRNA gene sequence is defined as the DNA sequence region between universal PCR primers 27F and 1492R for Bacteria (Lane, 1991), and between PCR primers A25F and U1492R for Archaea (Dojka et al. , 1998). This allows the fair calculation of sequence similarity between PCR-derived and genome-derived reference sequences. The use of these particular regions in EzBioCloud database was given by Kim et al. (2012); in the latter study, the determination of 16S cutoff, i.e., 98.7% similarity, for species delineation was proposed on the basis of this region as full-length 16S (Kim et al. 2014).

To reiterate more succinctly:

A complete 16S rRNA gene sequence is the DNA between PCR primers 27F and 1492R for Bacteria, and between PCR primers A25F and U1492R for Archaea.

The complete 16S rRNA gene sequence serves as a reference against which partial 16S rRNA gene sequences (obtained from high throughput sequencing) can be compared. Complete 16S rRNA gene lengths vary depending on species, and a complete or nearly complete sequence is generally required for taxonomic analyses.

Then how do we determine whether a 16S rRNA gene segment that was sequenced from a sample is complete or nearly complete? We use a measure called completeness.

Completeness is an objective measure of the degree of coverage of a query 16S rRNA gene sequence with respect to the full-length, complete 16S rRNA gene sequence.

Mathematically, completeness is defined as (Kim et al., 2012):

where L is the length of a query sequence and C is the length of the most similar sequence that is regarded as complete (using the definition above). The most similar sequence in the database of complete sequences is identified by using an algorithm called USEARCH.

The suggested minimum threshold for using a 16S rRNA gene sequence for taxonomic purposes is 95% completeness, as incomplete or partial sequences with low completeness scores will have insufficient resolving power, resulting in erroneous identification results.

Example

Consider a partial 16S rRNA gene sequence from the strain Nocardia carnea that’s 606 bp in length (Accession AY756546.1, 606 bp):

Completeness is 42.1% because the query 16S rRNA sequence (indicated in blue) only spans from 19~625 bp of the complete 16S rRNA sequence (indicated in red), which is 1439 bp long.

References

  • Lane, D. J. (1991). 16S/23S rRNA sequencing. In Nucleic Acid Techniques in Bacterial Systematics. Edited by E. Stackebrandt and M. Goodfellow. Chichester: Wiley.

Last Updated on April 1, 2018

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

[EzEditor2] Working with 16S rRNA sequences

We will work through with an example file. Please download “Leuconostoc_16s.ezb” here and open it with EzEditor2. This file contains 16S sequences of the type

16S rRNA and 16S rRNA Gene

Overview 16S rRNA stands for 16S ribosomal ribonucleic acid (rRNA), where S (Svedberg) is a unit of measurement (sedimentation rate). This rRNA is an important

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Stay up to date

Keep up with our latest developments