TrueBac ID Demonstrations
TrueBac ID has been designed to definitively identify bacteria using whole genome sequence data. Here we have run TrueBac ID on some publicly available data to highlight the accuracy of the system. Because taxonomy and the TrueBac Reference Database are constantly being updated, the data presented here are those at the time of analysis.
Case #1: NCBI bacterial genome database
Input dataset contains 99,078 bacterial genomes from pure cultures (excluding metagenome and single cell assemblies). Contaminated genomes were also excluded by the ContEst16S tool. All identification results of TrueBac ID are provided at www.ezbiocloud.net. The identification was carried out on May 15, 2018.
Case #2: An unbiased collection of clinical isolates
A team at the University of Washington Medical Center published the genome data of >1,200 bacterial strains isolated from an Intensive Care Unit for a year (Roach et al., 2015; PLOS Genetics 11:e1005413). TrueBac ID was used to re-analyze the same dataset. The identification was carried out on May 15, 2018.
Detailed identification results are available here.
Case #3: Accurate identification of a gut bacterium fails using MALDI-TOF and other conventional methods
A team at Harvard University isolated a potential therapeutics strain from human gut. This strain could not be identified by MALDI-TOF or other conventional methods, so it was tentatively proposed as a novel species ‘Clostridium immunis‘ (Nature 2017; 14;552(7684):244-247) . TrueBac ID successfully identified this strain as Clostridium symbiosum by 98.48% Average Nucleotide Identity (ANI).
Categories of TrueBac ID results against the original species/subspecies designations in the database or publications
Further identified as the species level:
The original name of the genome has the correct genus name but does not contain specific epithet. In this example, Sulfitobacter sp. NAS-14.1 (GCA_000152645.1) is identified as Sulfitobacter pontiacus.
Further identified as the subspecies level:
The original name of the genome has the correct species name but does not contain subspecies name. In this example, Pasteurella multocida FDAARGOS_384 (GCF_002393385.1) is further identified to Pasteurella multocida subsp. septica.
Identified as a genomospecies:
Genomospecies is a potentially novel species and tentatively named in EzBioCloud/TrueBac databases [Learn more]. Actinomyces odontolyticus ATCC 17982 (GCF_000154225.1) is not a strain of Actinomyces odontolyticus but represents a novel species which we named a genomospecies (DS264586_s).
Not identified at the species level:
Because either it is a novel species or there is no sufficient reference genome data, the genome cannot be identified to a known species with confidence. Haemophilus parainfluenzae strain 1209_HPAR (GCA_001053035.1) is identified as a novel species. In this example, the closest known species is Haemophilus parainfluenzae.
The EzBioCloud team / Last edited on May 22, 2018