Genome-based Identification for Improving Reference Databases

Identification of Lachnospiraceae bacterium 5_1_63FAA
10/28/2018

Genome-based Identification for Improving Reference Databases

Misidentified or incompletely identified bacterial genome sequences appear frequently in public reference databases. These databases can be significantly improved by genome-based identification against an up-to-date, systematically curated reference database that covers as many as species.

Using a combination of curated reference databases and optimized algorithms,  TrueBac ID can not only correctly identify genome sequences from sample data, but it can also be used to correctly identify genomes included in public databases and other reference microbiological resources.

This document highlights a few selected cases in the following categories (click to jump to each case):

 


Misidentified

The following genomes were misidentified at the species level. However, TrueBac ID can correctly identify them using an updated, curated reference genome database.

 

Bacillus cereus ATCC 10987 (GCA_000008005.1)

This genome is labeled as Bacillus cereus in NCBI and ATCC websites. It is also included in the microbiome standards from ATCC and the Human Microbiome Project. However, it is a strain of a recently described species called Bacillus pacificus with ANI value of 99.84%. TrueBac ID result is accessible here.

 

Identification of ATCC 10987 by TrueBac ID

 

ATCC 10987 in NCBI

 

ATCC 10987 in ATCC

Ruminococcus sp. 5_1_39BFAA (GCA_000159975.2)

This genome is a part of the early reference genome database for the Human Microbiome Project (HMP). Since it is labeled as Ruminococcus sp. 5_1_39BFAA  in NCBI, this leads to the misunderstanding that the genus Ruminococcus is abundant in the human gut microbiota. TrueBac ID  precisely identifies this genome as Blautia wexlerae. There are 10 genomes of Blautia wexlerae available in the EzBioCloud database [Learn more]. The TrueBac ID result is accessible here and you can read more about this story here.

 

Identification of Ruminococcus sp. 5_1_39BFAA by TrueBac ID

 

 

Flavobacterium aquidurense DSM 18293 (GCA_002217195.1)

This genome is labeled as the type strain of Flavobacterium aquidurense in NCBI (see below).

 

Flavobacterium aquidurense DSM 18293 in NCBI

However, this genome sequence does not match with the known phylogenetic marker sequences, therefore it should be incorrectly labeled. The following table showed the differences between this genome and marker genes.

gene

accession

Type strain

Gene length (bp)

Identity (%)

Mismatches (bp)

Aligned length

16S

AM177392

WB 1.1-56

1494

98.79

18

1493(99 %)

rpoC

JX657165

DSM 18293

590

96.44

21

590(100 %)

gyrB

JX444249

DSM 18293

556

90.11

55

556(100 %)

dnaK

JX430384

DSM 18293

592

91.05

53

592(100 %)

 

TrueBac ID identified this genome successfully as a strain of Flavobacterium frigidimaris with a very high ANI (99.94%).

 

Identification of Flavobacterium aquidurense DSM 18293 by TrueBac ID

 

 

Enterobacter cloacae FDAARGOS_69 (GCA_000783835.2)

This genome is labeled as Enterobacter cloacae in NCBI, but it is not related to Enterobacter cloacae. The highest ANI value (99.02%) is obtained for Enterobacter hormaechei subsp. steigerwaltii, so it should be assigned to this subspecies. There are 70 genomes available for this subspecies [Learn more] at the time of this writing. Interestingly, this strain showed 29 genes or determinants for antibiotic resistance. TrueBac ID result is accessible here.

 

Identification of Enterobacter cloacae FDAARGOS_69 by TrueBac ID

 

FDAARGOS_69 in NCBI

 


Further identified as the species level

The following genomes were identified at the genus or higher level. However, TrueBac ID can identify correctly at the species level.

Staphylococcus sp. FDAARGOS_39 (GCF_001019115.2)

This genome is labeled as a strain of the genus Staphylococcus without species information. TrueBac ID identified it at the species level as Staphylococcus warneri with very high ANI value (99.5%). TrueBac ID result is accessible here.

Identification of FDAARGOS_39 by TrueBac ID

 

FDAARGOS_39 in NCBI

 


Further identified at the subspecies level

The following genomes were identified at the species level. However, TrueBac ID can identify correctly at the subspecies level as well.

 

Pasteurella multocida FDAARGOS_261 (GCA_002083205.2)

Identification of this genome can be made at the subspecies level as the type strain genome of Pasteurella multocida subsp. septica has been added to our reference database. The TrueBac ID result is accessible here.

Identification of FDAARGOS_261 by TrueBac ID

 

FDAARGOS_261 in NCBI

 


Identified as genomospecies

Genomospecies are novel species that are tentatively named in the EzBioCloud and TrueBac databases [Learn more].

 

Actinomyces odontolyticus ATCC 17982 (GCA_000154225.1)

Actinomyces odontolyticus ATCC 17982 is not a strain of Actinomyces odontolyticus (ANI=89.06 %) but represents a novel species which we named DS264586_s. It is also included in the microbiome standards from ATCC and the Human Microbiome Project.  The TrueBac ID result is accessible here.

 

Identification of Actinomyces odontolyticus ATCC 17982 by TrueBac ID

 

ATCC 17982 in NCBI

ATCC 17982 in ATCC

 

 

Providencia rettgeri FDAARGOS_330 (GCF_002984195.1)

FDAARGOS_330 is not a strain of Providencia rettgeri (ANI=85.32 %) but represents a novel species which was tentatively named as CP017671_s in EzBioCloud database. This new species is likely a human pathogen as they were isolated from patients in the US and Colombia [Learn more].

 

Identification of Providencia rettgeri FDAARGOS_330 by TrueBac ID

 

 

 


By the TrueBac ID team. To test-drive TrueBac genome-based ID, please visit https://www.truebacid.com/.

Should you have any queries or require any further information please do not hesitate to contact us at info@chunlab.com.

Last updated on Nov. 2, 2018