Powered by Precision,
Driven by Quality

Genome-based Identification for Improving Reference Databases

Subscribe To Our Newsletter

Get updates and learn from the best

Misidentified or incompletely identified bacterial genome sequences appear frequently in public reference databases. These databases can be significantly improved by genome-based identification against an up-to-date, systematically curated reference database that covers as many as species.

Using a combination of curated reference databases and optimized algorithms,  TrueBac ID can not only correctly identify genome sequences from sample data, but it can also be used to correctly identify genomes included in public databases and other reference microbiological resources.

This document highlights a few selected cases in the following categories (click to jump to each case):


Misidentified

The following genomes were misidentified at the species level. However, TrueBac ID can correctly identify them using an updated, curated reference genome database.

Bacillus cereus ATCC 10987 (GCA_000008005.1)

This genome is labeled as Bacillus cereus in NCBI and ATCC websites. It is also included in the microbiome standards from ATCC and the Human Microbiome Project. However, it is a strain of a recently described species called Bacillus pacificus with ANI value of 99.84%. TrueBac ID result is accessible here.

Identification of ATCC 10987 by TrueBac ID
ATCC 10987 in NCBI
ATCC 10987 in ATCC

Ruminococcus sp. 5_1_39BFAA (GCA_000159975.2)

This genome is a part of the early reference genome database for the Human Microbiome Project (HMP). Since it is labeled as Ruminococcus sp. 5_1_39BFAA  in NCBI, this leads to the misunderstanding that the genus Ruminococcus is abundant in the human gut microbiota. TrueBac ID  precisely identifies this genome as Blautia wexlerae. There are 10 genomes of Blautia wexlerae available in the EzBioCloud database [Learn more]. The TrueBac ID result is accessible here and you can read more about this story here.

Identification of Ruminococcus sp. 5_1_39BFAA by TrueBac ID

Enterobacter cloacae FDAARGOS_69 (GCA_000783835.1)

This genome is labeled as Enterobacter cloacae in NCBI, but it is not related to Enterobacter cloacae. The highest ANI value (99.02%) is obtained for Enterobacter hormaechei subsp. steigerwaltii, so it should be assigned to this subspecies. There are 70 genomes available for this subspecies [Learn more] at the time of this writing. Interestingly, this strain showed 29 genes or determinants for antibiotic resistance. TrueBac ID result is accessible here. Please note that this entry is now updated as GCA_000783835.2.

Identification of Enterobacter cloacae FDAARGOS_69 by TrueBac ID
FDAARGOS_69 in NCBI

Further identified as the species level

The following genomes were identified at the genus or higher level. However, TrueBac ID can identify correctly at the species level.

Staphylococcus sp. FDAARGOS_39 (GCF_001019115.2)

This genome is labeled as a strain of the genus Staphylococcus without species information. TrueBac ID identified it at the species level as Staphylococcus warneri with very high ANI value (99.5%). TrueBac ID result is accessible here.

Identification of FDAARGOS_39 by TrueBac ID
FDAARGOS_39 in NCBI

Further identified at the subspecies level

The following genomes were identified at the species level. However, TrueBac ID can identify correctly at the subspecies level as well.

Pasteurella multocida FDAARGOS_261 (GCA_002083205.2)

Identification of this genome can be made at the subspecies level as the type strain genome of Pasteurella multocida subsp. septica has been added to our reference database. The TrueBac ID result is accessible here.

Identification of FDAARGOS_261 by TrueBac ID
FDAARGOS_261 in NCBI

Identified as genomospecies

Genomospecies are novel species that are tentatively named in the EzBioCloud and TrueBac databases [Learn more].

Actinomyces odontolyticus ATCC 17982 (GCA_000154225.1)

Actinomyces odontolyticus ATCC 17982 is not a strain of Actinomyces odontolyticus (ANI=89.06 %) but represents a novel species which we named DS264586_s. It is also included in the microbiome standards from ATCC and the Human Microbiome Project.  The TrueBac ID result is accessible here.

Identification of Actinomyces odontolyticus ATCC 17982 by TrueBac ID
ATCC 17982 in NCBI
ATCC 17982 in ATCC

Providencia rettgeri FDAARGOS_330 (GCF_002984195.1)

FDAARGOS_330 is not a strain of Providencia rettgeri (ANI=85.32 %) but represents a novel species which was tentatively named as CP017671_s in EzBioCloud database. This new species is likely a human pathogen as they were isolated from patients in the US and Colombia [Learn more]. The TrueBac ID result is accessible here.

Identification of Providencia rettgeri FDAARGOS_330 by TrueBac ID

By the TrueBac ID team. To test-drive TrueBac genome-based ID, please visit www.truebacid.com.

Should you have any queries or require any further information please do not hesitate to contact us at bs.ngs@cj.net.

Last updated on FEB. 18, 2020

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

[OAU] Mac OSX

Step-by-step guide for Apple Mac OSX users First, download necessary files OAU.jar file from here. USEARCH 32bit version for OSX from here. Example file from

[UBCG] example commands

Run these lines to generate bcg files from fasta files. java -jar UBCG.jar extract -bcg_dir bcg -i fasta/CP012646_s_GCA_001281025.1_KCOM_1350.fasta -label “CP012646_s KCOM 1350” -acc “GCA_001281025.1” -taxon

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Stay up to date

Keep up with our latest developments