Powered by Precision,
Driven by Quality

Subscribe To Our Newsletter

Get updates and learn from the best

The analysis of all bacterial genome starts with genome annotation.  This process can be divided into two steps: Gene-finding step and Functional Annotation step.

“Gene-finding” step uses genome sequences to find the various patterns of gene’s start and end location, and “Functional Annotation” step finds and annotates the function of each gene through sequence search. The analysis results obtained between researchers can vary slightly due to the different software, database, and parameters they used, but there is no big difference between our pipeline and pipelines used by other database since we use the most common pipeline in academia. In EzBioCloud, for all genomes, the following software and database are used to perform genome annotation and also comparative genomics. As of Sept 2017, more than 90,000 genomes were annotated using the following method and is provided through www.ezbiocloud.net.

More detailed information:

Pipeline Steps Run Description
Finding tRNA genes Program: tRNA-scan version 1.3.1

Run Parameter: tRNA-scan-SE –bact [Fasta File]

Finding rRNA genes Program: INFERNAL version 1.0.2 (cmsearch)

Database: rfam 12.0

Run Parameter: -E 1.0E-5 -Z 700 –noali  rfam12.0/rRNA_bact.cm [Fasta File]

Finding CRISPR Program: PilerCR version 1.06

Run Parameter: pilercr -in [Fasta File] -out [Output File]

Program: CRT version 1.2

Run Parameter: java -cp CRT1.2-CLI.jar crt [Input Fasta File]

Finding ncRNA Program: INFERNAL version 1.0.2 (cmsearch)

Database: Rfam 12.0

Run Parameter: cmsearch -E 1.0E-5 -Z 700 –noali  rfam12.0/RNase_bact.cm [Fasta File]

Run Parameter: -E 1.0E-5 -Z 700 –noali  rfam12.0/Gene_bact.cm [Fasta File]

Finding CDS Program: PRODIGAL version 2.6.2

Run Parameter: -i [Input Fasta File] -o [Output GFF File] -f gff -m -c -g 11 -a [Output Protein Fasta File]

Functional annotation Program: usearch 64bit version 8.0.1517

Database:

-KEGG version (Date: 2015.12.10)
-eggnog version 4.1
-swissprot (Date: 2015.12.10)
-SEED subsystems (Date: 2015.12.10)

Run Parameter: -ublast [Input Fasta File] -db [DB File] -maxaccepts 1 -evalue 1.0E-5 -accel 1.0 -ka_dbsize 700000000 -alnout [Output File]

Last updated on Sept 10, 2017 (JC)

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Genome-based Identification for Improving Reference Databases

Misidentified or incompletely identified bacterial genome sequences appear frequently in public reference databases. These databases can be significantly improved by genome-based identification against an up-to-date, systematically curated

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email
small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Have a Question? Let's have a chat?

We're here to answer any question you might have

small_c_popup.png

Stay up to date

Keep up with our latest developments