Genome-based Identification for Improving Reference Databases

Check List for 16S-based Microbiome Analysis
11/02/2018
TrueBac ID – How to upload your genome
01/29/2019

Genome-based Identification for Improving Reference Databases

Misidentified or incompletely identified bacterial genome sequences appear frequently in public reference databases. These databases can be significantly improved by genome-based identification against an up-to-date, systematically curated reference database that covers as many as species.

Using a combination of curated reference databases and optimized algorithms,  TrueBac ID can not only correctly identify genome sequences from sample data, but it can also be used to correctly identify genomes included in public databases and other reference microbiological resources.

This document highlights a few selected cases in the following categories (click to jump to each case):


Misidentified

The following genomes were misidentified at the species level. However, TrueBac ID can correctly identify them using an updated, curated reference genome database.

Bacillus cereus ATCC 10987 (GCA_000008005.1)

This genome is labeled as Bacillus cereus in NCBI and ATCC websites. It is also included in the microbiome standards from ATCC and the Human Microbiome Project. However, it is a strain of a recently described species called Bacillus pacificus with ANI value of 99.84%. TrueBac ID result is accessible here.

Identification of ATCC 10987 by TrueBac ID

ATCC 10987 in NCBI

ATCC 10987 in ATCC

Ruminococcus sp. 5_1_39BFAA (GCA_000159975.2)

This genome is a part of the early reference genome database for the Human Microbiome Project (HMP). Since it is labeled as Ruminococcus sp. 5_1_39BFAA  in NCBI, this leads to the misunderstanding that the genus Ruminococcus is abundant in the human gut microbiota. TrueBac ID  precisely identifies this genome as Blautia wexlerae. There are 10 genomes of Blautia wexlerae available in the EzBioCloud database [Learn more]. The TrueBac ID result is accessible here and you can read more about this story here.

Identification of Ruminococcus sp. 5_1_39BFAA by TrueBac ID

Enterobacter cloacae FDAARGOS_69 (GCA_000783835.2)

This genome is labeled as Enterobacter cloacae in NCBI, but it is not related to Enterobacter cloacae. The highest ANI value (99.02%) is obtained for Enterobacter hormaechei subsp. steigerwaltii, so it should be assigned to this subspecies. There are 70 genomes available for this subspecies [Learn more] at the time of this writing. Interestingly, this strain showed 29 genes or determinants for antibiotic resistance. TrueBac ID result is accessible here.

Identification of Enterobacter cloacae FDAARGOS_69 by TrueBac ID

FDAARGOS_69 in NCBI


Further identified as the species level

The following genomes were identified at the genus or higher level. However, TrueBac ID can identify correctly at the species level.

Staphylococcus sp. FDAARGOS_39 (GCF_001019115.2)

This genome is labeled as a strain of the genus Staphylococcus without species information. TrueBac ID identified it at the species level as Staphylococcus warneri with very high ANI value (99.5%). TrueBac ID result is accessible here.

Identification of FDAARGOS_39 by TrueBac ID

FDAARGOS_39 in NCBI


Further identified at the subspecies level

The following genomes were identified at the species level. However, TrueBac ID can identify correctly at the subspecies level as well.

Pasteurella multocida FDAARGOS_261 (GCA_002083205.2)

Identification of this genome can be made at the subspecies level as the type strain genome of Pasteurella multocida subsp. septica has been added to our reference database. The TrueBac ID result is accessible here.

Identification of FDAARGOS_261 by TrueBac ID

FDAARGOS_261 in NCBI


Identified as genomospecies

Genomospecies are novel species that are tentatively named in the EzBioCloud and TrueBac databases [Learn more].

Actinomyces odontolyticus ATCC 17982 (GCA_000154225.1)

Actinomyces odontolyticus ATCC 17982 is not a strain of Actinomyces odontolyticus (ANI=89.06 %) but represents a novel species which we named DS264586_s. It is also included in the microbiome standards from ATCC and the Human Microbiome Project.  The TrueBac ID result is accessible here.

Identification of Actinomyces odontolyticus ATCC 17982 by TrueBac ID

ATCC 17982 in NCBI

ATCC 17982 in ATCC

Providencia rettgeri FDAARGOS_330 (GCF_002984195.1)

FDAARGOS_330 is not a strain of Providencia rettgeri (ANI=85.32 %) but represents a novel species which was tentatively named as CP017671_s in EzBioCloud database. This new species is likely a human pathogen as they were isolated from patients in the US and Colombia [Learn more]. The TrueBac ID result is accessible here.

Identification of Providencia rettgeri FDAARGOS_330 by TrueBac ID


By the TrueBac ID team. To test-drive TrueBac genome-based ID, please visit //www.truebacid.com/.

Should you have any queries or require any further information please do not hesitate to contact us at info@chunlab.com.

Last updated on Nov. 2, 2018