A tetra-nucleotide is a fragment of DNA sequence with 4 bases (e.g. AGTC or TTGG). Pride et al. (2003) showed that the frequency of tetra-nucleotides in bacterial genomes contain useful, albeit weak, phylogenetic signals. Even though tetra-nucleotide analysis (TNA) utilizes the information of whole genome, it is evident that it cannot replace other alignment-based phylogenetic methods such as OrthoANI or 16S rRNA phylogeny. However, TNA can be useful for phylogenetic characterization when whole genome or 16S rRNA gene information is not available. For example, a partial genomic fragment obtained from a metagenome can be identified by TNA (Teeling et al., 2004). TNA is also fast enough that it can be used as a search engine against a large genome database.
Basically, information contained in a genome sequence can be transformed to an array of tetra-nucleotide frequencies (See the below figure).
Information of each genome sequence is now stored as counts of 256 tetra-nucleotides. When two genome sequences are similar, the more correlated these tetra-nucleotide patterns are. Therefore, statistical measure of tetra-nucleotide frequency correlation between two genome sequences can be roughly used to determine the genome-relatedness of two genomes.
Tetra-nucleotide correlation coefficient ranges from 0 to 1, and two identical genomes would produce 1.0.
- Pride, D. T., Meinersmann, R. J., Wassenaar, T. M. & Blaser, M. J. Evolutionary implications of microbial genome tetranucleotide frequency biases. Genome Res 13, 145-158 (2003).
- Teeling, H., Meyerdierks, A., Bauer, M., Amann, R. & Glockner, F. O. Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ Microbiol 6, 938-947 (2004).
Last updated on April 28th, 2016 (EK)