Misidentified or incompletely identified bacterial genome sequences appear frequently in public reference databases. These databases can be significantly improved by genome-based identification against an up-to-date, systematically curated reference database that covers as many as species.
Using a combination of curated reference databases and optimized algorithms, Â TrueBac ID can not only correctly identify genome sequences from sample data, but it can also be used to correctly identify genomes included in public databases and other reference microbiological resources.
This document highlights a few selected cases in the following categories (click to jump to each case):
- Misidentified
- Bacillus cereus ATCC 10987 → Bacillus pacificus (included in the Microbiome Standards)
- Ruminococcus sp. 5_1_39BFAA → Blautia wexlerae (a major human gut species reported in many shotgun metagenomics studies)
- Enterobacter cloacae FDAARGOS_69 → Enterobacter hormaechei subsp. steigerwaltii
- Further identified at the species level
- Further identified at the subspecies level
- Identified as genomospecies
- Actinomyces odontolyticus ATCC 17982 → DS264586_s (included in the Microbiome Standards)
- Providencia rettgeri FDAARGOS_330 → CP017671_s
Misidentified
The following genomes were misidentified at the species level. However, TrueBac ID can correctly identify them using an updated, curated reference genome database.
Bacillus cereus ATCC 10987 (GCA_000008005.1)
This genome is labeled as Bacillus cereus in NCBI and ATCC websites. It is also included in the microbiome standards from ATCC and the Human Microbiome Project. However, it is a strain of a recently described species called Bacillus pacificus with ANI value of 99.84%. TrueBac ID result is accessible here.



Ruminococcus sp. 5_1_39BFAA (GCA_000159975.2)
This genome is a part of the early reference genome database for the Human Microbiome Project (HMP). Since it is labeled as Ruminococcus sp. 5_1_39BFAA in NCBI, this leads to the misunderstanding that the genus Ruminococcus is abundant in the human gut microbiota. TrueBac ID  precisely identifies this genome as Blautia wexlerae. There are 10 genomes of Blautia wexlerae available in the EzBioCloud database [Learn more]. The TrueBac ID result is accessible here and you can read more about this story here.

Enterobacter cloacae FDAARGOS_69 (GCA_000783835.1)
This genome is labeled as Enterobacter cloacae in NCBI, but it is not related to Enterobacter cloacae. The highest ANI value (99.02%) is obtained for Enterobacter hormaechei subsp. steigerwaltii, so it should be assigned to this subspecies. There are 70 genomes available for this subspecies [Learn more] at the time of this writing. Interestingly, this strain showed 29 genes or determinants for antibiotic resistance. TrueBac ID result is accessible here. Please note that this entry is now updated as GCA_000783835.2.


Further identified as the species level
The following genomes were identified at the genus or higher level. However, TrueBac ID can identify correctly at the species level.
Staphylococcus sp. FDAARGOS_39 (GCF_001019115.2)
This genome is labeled as a strain of the genus Staphylococcus without species information. TrueBac ID identified it at the species level as Staphylococcus warneri with very high ANI value (99.5%). TrueBac ID result is accessible here.


Further identified at the subspecies level
The following genomes were identified at the species level. However, TrueBac ID can identify correctly at the subspecies level as well.
Pasteurella multocida FDAARGOS_261 (GCA_002083205.2)
Identification of this genome can be made at the subspecies level as the type strain genome of Pasteurella multocida subsp. septica has been added to our reference database. The TrueBac ID result is accessible here.


Identified as genomospecies
Genomospecies are novel species that are tentatively named in the EzBioCloud and TrueBac databases [Learn more].
Actinomyces odontolyticus ATCC 17982 (GCA_000154225.1)
Actinomyces odontolyticus ATCC 17982 is not a strain of Actinomyces odontolyticus (ANI=89.06 %) but represents a novel species which we named DS264586_s. It is also included in the microbiome standards from ATCC and the Human Microbiome Project.  The TrueBac ID result is accessible here.



Providencia rettgeri FDAARGOS_330 (GCF_002984195.1)
FDAARGOS_330 is not a strain of Providencia rettgeri (ANI=85.32 %) but represents a novel species which was tentatively named as CP017671_s in EzBioCloud database. This new species is likely a human pathogen as they were isolated from patients in the US and Colombia [Learn more]. The TrueBac ID result is accessible here.

By the TrueBac ID team. To test-drive TrueBac genome-based ID, please visit www.truebacid.com.
Should you have any queries or require any further information please do not hesitate to contact us at bs.ngs@cj.net.
Last updated on FEB. 18, 2020