How to use EzBioCloud 16S database with MOTHUR

Subscribe To Our Newsletter

Get updates and learn from the best

MOTHUR is a widely used, open-source bioinformatics pipeline for microbiome analysis (https://www.mothur.org/). In this document, we provide a guide for using EzBioCloud’s 16S database with MOTHUR’s pipeline (https://www.mothur.org/wiki/MiSeq_SOP).

Unlike other public databases, EzBioCloud’s 16S database can be used for species-level identification of OTUs and is freely available for academic, not-for-profit purposes. To request, please visit our form here.

0. Preparation

To begin microbiome analysis using EzBioCloud’s 16S database with MOTHUR, a few preparatory steps are required:

Installation of mothur
a FASTA file
- ensure that sequence barcodes are removed
- merge paired-end sequences and avoid not paired FASTQs
- ensure that sequences are processed to filter by quality, length, etc.)
EzBioCloud’s 16S database
- aligned FASTA file
- MAPPING file

Figure 1. Overview of mothur pipeline based on MiSeq SOP

1. Unique sequences

Before mothur’s pipeline is able to function, duplicated sequences in the fasta file, processed by quality control, should be removed. You can use unique.seqs command.

Command line

unique.seqs(fasta=example.fasta)

Option description
fasta=path to the input fasta file

Related mothur site
https://www.mothur.org/wiki/Unique.seqs

2. Align sequences

The fasta file aligned by EzBioCloud’s 16S database file can be inputted using align.seqs. If you need more machine memory, try using multi-threads by adding a processor parameter.

Command line

align.seqs(fasta=example.unique.fasta, reference=eztaxon_full.align)

Option description
fasta=path to the input fasta file
reference=path to the reference fasta file
processors=Integer : multiple processors

Related mothur site
https://www.mothur.org/wiki/Align.seqs

3. Clean alignment

3.1. screen sequences

The screen.seqs command enables you to keep sequences that fulfill certain user defined criteria.

Command line

screen.seqs(fasta=example.unique.align,name=example.names,optimize=start-end-maxhomop,criteria=95)

Option description
fasta=path to the input fasta file
optimize, criteria=The optimize and criteria parameters allow you set the start, end, maxabig, maxhomop, minlength, maxlength, minoverlap, ostart, oend, mismatches, maxn, minscore, maxinsert and minsim parameters relative to your set of sequences.

Related mothur site
https://www.mothur.org/wiki/Screen.seqs

3.2. filter sequences

filter.seqs removes columns from alignments based on a criteria defined by the user.

Command line

filter.seqs(fasta=example.unique.good.align,vertical=T,trump=.)

Option description
fasta=path to the input fasta file
vertical=any column that only contains gap characters (i.e. ‘-‘ or ‘.’) is ignored.
trump=The trump option will remove a column if the trump character is found at that position in any sequence of the alignment.

Related mothur site
https://www.mothur.org/wiki/Filter.seqs

3.3. unique sequences filtered

This step is to check and remove duplicate sequences corrected by screen.seqs and filter.seqs.

Command line

unique.seqs(fasta=example.unique.good.filter.fasta,name=example.good.names)

Option description
fasta=path to the input fasta file
name=The name file is used to show the relationship between a representative sequence and the sequences it represents.

Related mothur site
https://www.mothur.org/wiki/Unique.seqs

4. Pre-cluster sequences

The pre.cluster command implements a pseudo-single linkage algorithm with the goal of removing sequences that are likely due to pyrosequencing errors.

Command line

pre.cluster(fasta=example.unique.good.filter.unique.fasta,name=example.unique.good.filter.names,diffs=2)

Option description
fasta=path to the input fasta file
name=The name file
diffs=2 : pre.cluster command will look for sequences that are within 2 mismatch of the sequence being considered.

Related mothur site
https://www.mothur.org/wiki/Pre.cluster

5. Detect chimeric sequences

The chimera.uchime command reads a fasta and reference file, and outputs potentially chimeric sequences.

Command line

chimera.uchime(fasta=example.unique.good.filter.unique.precluster.fasta,name=example.unique.good.filter.unique.precluster.names)

Option description
fasta=path to the input fasta file
name=The name file

Related mothur site
https://www.mothur.org/wiki/Chimera.uchime

6. Remove chimeric sequences

The remove.seqs command takes a list of sequence names and a fasta file to generate a new file that does not contain sequences on the list.

Command line

remove.seqs(fasta=example.unique.good.filter.unique.precluster.fasta,name=example.unique.good.filter.unique.precluster.names,accnos=example.unique.good.filter.unique.precluster.uchime.accnos)

Option description
fasta=path to the input fasta file
name=The name file
accnos=the file including chimeric sequences

Related mothur site
https://www.mothur.org/wiki/Remove.seqs

7. Classify sequences

The classify.seqs command allows the user to use several different methods to assign their sequences to the taxonomy outline of their choice.

Command line

classify.seqs(fasta=example.unique.good.filter.unique.precluster.pick.fasta,name=example.unique.good.filter.unique.precluster.pick.names,template=eztaxon_full.align,taxonomy=eztaxon_id_taxonomy.tax,method=knn,numwanted=1)

Option description
fasta=path to the input fasta file
name=The name file
template=DB fasta file
taxonomy=The taxonomy id file
method=knn : k-Nearest Neighbor algorithm
numwanted= 1 : you instead only want the value of 1 to be 3

Related mothur site
https://www.mothur.org/wiki/Classify.seqs

8. Remove non-bacteria sequences

The remove.lineage command reads the taxonomy file and taxon and generates a new file that contains only sequences without the taxon provided above.

Command line

remove.lineage(fasta=example.unique.good.filter.unique.precluster.pick.fasta,name=example.unique.good.filter.unique.precluster.pick.names,taxonomy=example.unique.good.filter.unique.precluster.pick..taxonomy,taxon=Mitochondria-Chloroplast-Archaea-Eukaryota-unknown)

Option description
fasta=path to the input fasta file
name=The name file
taxonomy=The taxonomy id file
taxon=The taxon parameter allows you to select the taxons you would like to remove, and is required

Related mothur site
https://www.mothur.org/wiki/Remove.lineage

9. Calculate uncorrected pairwise distances

The dist.seqs command will calculate uncorrected pairwise distances between aligned DNA sequences.

Command line

dist.seqs(fasta=example.unique.good.filter.unique.precluster.pick.pick.fasta,cutoff=0.15)

Option description
fasta=path to the input fasta file
cutoff=If you know that you are not going to form OTUs with distances larger than 0.15, you can tell mothur to not save any distances larger than 0.15.

Related mothur site
https://www.mothur.org/wiki/Dist.seqs

10. Assign sequences to OTUs

Once a distance matrix gets read into mothur, the cluster command can be used to assign sequences to OTUs.

Command line

cluster(column=example.unique.good.filter.unique.precluster.pick.pick.dist,name=example.unique.good.filter.unique.precluster.pick.pick.names)

Option description
column=To read in a column-formatted distance matrix you must provide a filename for the name option.
name=The name file

Related mothur site
https://www.mothur.org/wiki/Cluster

11. Classify OTUs

The classify.otu command is used to get a consensus taxonomy for an OTU.

Command line

classify.otu(taxonomy=example.unique.good.filter.unique.precluster.pick..pick.taxonomy,list=example.unique.good.filter.unique.precluster.pick.pick.an.list,name=example.unique.good.filter.unique.precluster.pick.pick.names)

Option description
taxonomy=taxonomy file
list=The list file result from cluster
name=The name file

Related mothur site
https://www.mothur.org/wiki/Classify.otu

Written by Jimmy Lim (Dec 2016); Edited by Mikael Hwang (Jan 2017)

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

16S rRNA and 16S rRNA Gene

Overview 16S rRNA stands for 16S ribosomal ribonucleic acid (rRNA), where S (Svedberg) is a unit of measurement (sedimentation rate). This rRNA is an important

CJ Bioscience, Inc. 05/15/2017

EzBioCloud Genome Database

The EzBioCloud Genome Database is a part of EzBioCloud.net. It is maintained by CJ Bioscience, Inc. to provide best-curated genome database of Bacteria and Archaea.

CJ Bioscience, Inc. 09/07/2017

Powered by Precision,
Driven by Quality

How to use EzBioCloud 16S database with MOTHUR

Subscribe To Our Newsletter

Get updates and learn from the best

0. Preparation

1. Unique sequences

2. Align sequences

3. Clean alignment

3.1. screen sequences

3.2. filter sequences

3.3. unique sequences filtered

4. Pre-cluster sequences

5. Detect chimeric sequences

6. Remove chimeric sequences

7. Classify sequences

8. Remove non-bacteria sequences

9. Calculate uncorrected pairwise distances

10. Assign sequences to OTUs

11. Classify OTUs

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

16S rRNA and 16S rRNA Gene

EzBioCloud Genome Database

Share This Post

Powered by Precision,
Driven by Quality

Site map

Contact info

Address

Family sites

Have a Question? Let's have a chat?

We're here to answer any question you might have

Have a Question? Let's have a chat?

We're here to answer any question you might have

Stay up to date

Keep up with our latest developments

Powered by Precision, Driven by Quality

How to use EzBioCloud 16S database with MOTHUR

Subscribe To Our Newsletter

Get updates and learn from the best

0. Preparation

1. Unique sequences

2. Align sequences

3. Clean alignment

3.1. screen sequences

3.2. filter sequences

3.3. unique sequences filtered

4. Pre-cluster sequences

5. Detect chimeric sequences

6. Remove chimeric sequences

7. Classify sequences

8. Remove non-bacteria sequences

9. Calculate uncorrected pairwise distances

10. Assign sequences to OTUs

11. Classify OTUs

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

16S rRNA and 16S rRNA Gene

EzBioCloud Genome Database

Share This Post

Powered by Precision, Driven by Quality

Site map

Contact info

Address

Family sites

Have a Question? Let's have a chat?

We're here to answer any question you might have

Have a Question? Let's have a chat?

We're here to answer any question you might have

Stay up to date

Keep up with our latest developments

Powered by Precision,
Driven by Quality

Powered by Precision,
Driven by Quality