How to use EzBioCloud 16S database with QIIME

Subscribe To Our Newsletter

Get updates and learn from the best

QIIME is the most widely used open-source bioinformatics pipeline for microbiome analysis (http://qiime.org/). In this document, we provide a guide for using EzBioCloud’s 16S database with QIIME’s pipeline.

Unlike other public databases, EzBioCloud’s 16S database can be used for species-level identification of OTUs and is freely available for academic, not-for-profit purposes. To request, please visit our form here.

Version:1.5

Update:2017.01

0. Preparation

To begin microbiome analysis using EzBioCloud’s 16S database with QIIME, a few preparatory steps are required:

Installation of QIIME and USEARCH
Obtain FASTQ file
- ensure that sequence barcodes are removed
- merge paired-end sequences and avoid not paired FASTQs
- ensure that sequences are processed to filter by quality, length, etc.)
Obtain EzBioCloud’s 16S database containing
- FASTA file
- MAPPING file

1. Indexing sequences

All sequences in the FASTQ file must be indexed for the analysis to proceed. For example, all sequences in the FASTQ file with the name “A” must be indexed as “A_1”, “A_2”, “A_n”…

In this way “[sample name] _ [number]” should be at the beginning of each sequence name. You can apply such indexing by using other programs but in this case, we recommend using QIIME’s “split_libraries_fastq.py”source code.

Command line

$ split_libraries_fastq.py -i example.fastq -o example_index --barcode_type='not-barcoded' --sample_ids=example --phred_offset=33 -q 0 -p 0.00001

Option description

-i : The sequence read FASTQ files (comma-separated if more than oen)
-o : directory to store output files
--barcode_type : Type of barcode used (this can be an integer)
--sample_ids : comma-separated list of samples ids to be applied to all sequences (must be one per input file path)
--phred_offset : the ascii offset to use when decoding phred scores
-q : the maximum unacceptable phred quality score
-p : minimum number of consecutive high quality base calls to include a read

Related QIIME site
http://qiime.org/scripts/split_libraries_fastq.html

2. Identifying and filtering out chimeric sequences

This step is to detect and filter out chimeric sequences from the indexed FASTQ file using a chimera detection program such as usearch61.

Command line

$ identify_chimeric_seqs.py -m usearch61 -i example_index/seqs.fna --suppress_usearch61_ref -o chimera

$ filter_fasta.py -f example_index/seqs.fna -s chimera/chimeras.txt -n -o example.non_chimera.fasta

Option description

identify_chimeric_seqs.py

 -i : path to the input fasta file
 -m : chimera detection method. Choices: blast_fragments or ChimeraSlayer or usearch61
 --suppress_usearch61_ref : use to suppress reference based chimera detection with usearch61
 -o : path to store output, output filepath in the case of blast_fragments and ChimeraSlayer, or directory in case of usearch61

filter_fasta.py

 -f : path to the input fasta file
 -s : a list of sequence identifiers (or tab-delimited lines with a seq identifier in the first field)
 -n : discard passed seq ids rather than keep passed seq ids
 -o : The output fasta filepath

Related QIIME site
http://qiime.org/scripts/identify_chimeric_seqs.html
http://qiime.org/scripts/filter_fasta.html

3. Perform open-reference OTU picking process

QIIME’s tutorial describes an open-referenced OTU picking process as:
“In an open-reference OTU picking process, reads are clustered against a reference sequence collection and any reads which do not hit the reference sequence collection are subsequently clustered de novo.”

In this guide, we’ll be using EzBioCloud’s 16S database as a reference with the open-reference protocol. However, as EzBioCloud’s 16S database is not QIIME’s default database (Greengene’s database is QIIME’s default), we must manually set a path for EzBioCloud’s taxonomy-id-file to map the results of the analysis file in order to overwrite the result’s default Greengene id. To do so:

Command line

$ pick_open_reference_otus.py -i example.non_chimera.fasta -r db_files/eztaxon_qiime_full.fasta -o results -a -O 4

Option description

-i : the input sequences filepath or comma-separated list of filepaths
-r : the reference sequences
-o : the output directory
-a : run in parallel where available
-O : number of jobs to start

Related QIIME site
http://qiime.org/scripts/pick_open_reference_otus.html
http://qiime.org/tutorials/otu_picking.html

4. Overwriting OTU table

Because of the issue described in section 3, “parallel_assign_taxonomy_uclust.py” should be performed again after setting EzBioCloud’s 16S database. Then, you should overwrite the previously generated “BIOM table.

In this process, we provide option descriptions to expire species-level identification of OTUs. These descriptions are highlighted in bold. Once you add these options into “parallel_assign_taxonomy_uclust.py” of the command line, you can get species-level identification results of OTUs.

Command line

$ parallel_assign_taxonomy_uclust.py -i results/rep_set.fna -r db_files/eztaxon_qiime_full.fasta -t db_files/eztaxon_id_taxonomy.txt -o results/uclust_assigned_taxonomy -T -O 4

$ biom add-metadata -i results/otu_table_mc2.biom --observation-metadata-fp results/uclust_assigned_taxonomy/rep_set_tax_assignments.txt  -o results/otu_table_mc2_w_tax.biom  --sc-separated taxonomy   --observation-header OTUID,taxonomy

Option description

parallel_assign_taxonomy_uclust.py

-i : full path to fasta file containing query sequences
-r : ref seqs to search against
-t : full path to id_to_taxonomy mapping file
-o : path to store output files
-T : poll directly for job completion rather than running poller as a separate job
-O : Number of jobs to start
--min_consensus_fraction=1 : Minimum fraction of database hits that must have a specific taxonomic assignment to assign that taxonomy to a query [default: 0.51]
--uclust_max_accepts=1: number of database hits to consider when making an assignment [default: 3]

biom add-metadata

-i : the input BIOM table
-o : the output BIOM table
--observation-metadata-fp : the observation metadata mapping file
--sc-separated : comma-separated list of the metadata fields
--observation-header : comma-separated list of the observation

Related QIIME site
http://qiime.org/scripts/parallel_assign_taxonomy_uclust.html
http://biom-format.org/documentation/adding_metadata.html

Edited by Mikael Hwang (Jan 2017)

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

[Quiz] identification of bacterial isolates using genome sequence

In this course, you will learn how to identify bacterial isolates using genome sequence. Please read the following materials first. Bacterial species concept Tutorial on

Jon Jongsik Chun, Ph.D. 09/18/2017

[EzEditor2] Phylogenetic analysis

This article will explain how to carry out phylogenetic analysis using EzEditor2 and other programs. We assume that you already aligned all sequences (either 16S

CJ Bioscience, Inc. 05/15/2017

Powered by Precision,
Driven by Quality

How to use EzBioCloud 16S database with QIIME

Subscribe To Our Newsletter

Get updates and learn from the best

0. Preparation

1. Indexing sequences

2. Identifying and filtering out chimeric sequences

3. Perform open-reference OTU picking process

4. Overwriting OTU table

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

[Quiz] identification of bacterial isolates using genome sequence

[EzEditor2] Phylogenetic analysis

Share This Post

Powered by Precision,
Driven by Quality

Site map

Contact info

Address

Family sites

Have a Question? Let's have a chat?

We're here to answer any question you might have

Have a Question? Let's have a chat?

We're here to answer any question you might have

Stay up to date

Keep up with our latest developments

Powered by Precision, Driven by Quality

How to use EzBioCloud 16S database with QIIME

Subscribe To Our Newsletter

Get updates and learn from the best

0. Preparation

1. Indexing sequences

2. Identifying and filtering out chimeric sequences

3. Perform open-reference OTU picking process

4. Overwriting OTU table

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

[Quiz] identification of bacterial isolates using genome sequence

[EzEditor2] Phylogenetic analysis

Share This Post

Powered by Precision, Driven by Quality

Site map

Contact info

Address

Family sites

Have a Question? Let's have a chat?

We're here to answer any question you might have

Have a Question? Let's have a chat?

We're here to answer any question you might have

Stay up to date

Keep up with our latest developments

Powered by Precision,
Driven by Quality

Powered by Precision,
Driven by Quality