MOTHUR is a widely used, open-source bioinformatics pipeline for microbiome analysis (https://www.mothur.org/). In this document, we provide a guide for using EzBioCloud’s 16S database with MOTHUR’s pipeline (https://www.mothur.org/wiki/MiSeq_SOP).
Unlike other public databases, EzBioCloud’s 16S database can be used for species-level identification of OTUs and is freely available for academic, not-for-profit purposes. To request, please visit our form here.
0. Preparation
To begin microbiome analysis using EzBioCloud’s 16S database with MOTHUR, a few preparatory steps are required:
- Installation of mothur
- a FASTA file
- ensure that sequence barcodes are removed
- merge paired-end sequences and avoid not paired FASTQs
- ensure that sequences are processed to filter by quality, length, etc.)
- EzBioCloud’s 16S database
- aligned FASTA file
- MAPPING file
Figure 1. Overview of mothur pipeline based on MiSeq SOP
1. Unique sequences
Before mothur’s pipeline is able to function, duplicated sequences in the fasta file, processed by quality control, should be removed. You can use unique.seqs command.
- Command line
unique.seqs(fasta=example.fasta)
- Option description
fasta=path to the input fasta file
- Related mothur site
https://www.mothur.org/wiki/Unique.seqs
2. Align sequences
The fasta file aligned by EzBioCloud’s 16S database file can be inputted using align.seqs. If you need more machine memory, try using multi-threads by adding a processor parameter.
- Command line
align.seqs(fasta=example.unique.fasta, reference=eztaxon_full.align)
- Option description
fasta=path to the input fasta file
reference=path to the reference fasta file
processors=Integer :Â multiple processors
- Related mothur site
https://www.mothur.org/wiki/Align.seqs
3. Clean alignment
3.1. screen sequences
The screen.seqs command enables you to keep sequences that fulfill certain user defined criteria.
- Command line
screen.seqs(fasta=example.unique.align,name=example.names,optimize=start-end-maxhomop,criteria=95)
- Option description
fasta=path to the input fasta file
optimize, criteria=The optimize and criteria parameters allow you set the start, end, maxabig, maxhomop, minlength, maxlength, minoverlap, ostart, oend, mismatches, maxn, minscore, maxinsert and minsim parameters relative to your set of sequences.
- Related mothur site
https://www.mothur.org/wiki/Screen.seqs
3.2. filter sequences
filter.seqs removes columns from alignments based on a criteria defined by the user.
- Command line
filter.seqs(fasta=example.unique.good.align,vertical=T,trump=.)
- Option description
fasta=path to the input fasta file
vertical=any column that only contains gap characters (i.e. ‘-‘ or ‘.’) is ignored.
trump=The trump option will remove a column if the trump character is found at that position in any sequence of the alignment.
- Related mothur site
https://www.mothur.org/wiki/Filter.seqs
3.3. unique sequences filtered
This step is to check and remove duplicate sequences corrected by screen.seqs and filter.seqs.
- Command line
unique.seqs(fasta=example.unique.good.filter.fasta,name=example.good.names)
- Option description
fasta=path to the input fasta file
name=The name file is used to show the relationship between a representative sequence and the sequences it represents.
- Related mothur site
https://www.mothur.org/wiki/Unique.seqs
4. Pre-cluster sequences
The pre.cluster command implements a pseudo-single linkage algorithm with the goal of removing sequences that are likely due to pyrosequencing errors.
- Command line
pre.cluster(fasta=example.unique.good.filter.unique.fasta,name=example.unique.good.filter.names,diffs=2)
- Option description
fasta=path to the input fasta file
name=The name file
diffs=2 :Â pre.cluster command will look for sequences that are within 2Â mismatch of the sequence being considered.
- Related mothur site
https://www.mothur.org/wiki/Pre.cluster
5. Detect chimeric sequences
The chimera.uchime command reads a fasta and reference file, and outputs potentially chimeric sequences.
- Command line
chimera.uchime(fasta=example.unique.good.filter.unique.precluster.fasta,name=example.unique.good.filter.unique.precluster.names)
- Option description
fasta=path to the input fasta file
name=The name file
- Related mothur site
https://www.mothur.org/wiki/Chimera.uchime
6. Remove chimeric sequences
The remove.seqs command takes a list of sequence names and a fasta file to generate a new file that does not contain sequences on the list.
- Command line
remove.seqs(fasta=example.unique.good.filter.unique.precluster.fasta,name=example.unique.good.filter.unique.precluster.names,accnos=example.unique.good.filter.unique.precluster.uchime.accnos)
- Option description
fasta=path to the input fasta file
name=The name file
accnos=the file including chimeric sequences
- Related mothur site
https://www.mothur.org/wiki/Remove.seqs
7. Classify sequences
The classify.seqs command allows the user to use several different methods to assign their sequences to the taxonomy outline of their choice.
- Command line
classify.seqs(fasta=example.unique.good.filter.unique.precluster.pick.fasta,name=example.unique.good.filter.unique.precluster.pick.names,template=eztaxon_full.align,taxonomy=eztaxon_id_taxonomy.tax,method=knn,numwanted=1)
- Option description
fasta=path to the input fasta file
name=The name file
template=DB fasta file
taxonomy=The taxonomy id file
method=knn : k-Nearest Neighbor algorithm
numwanted= 1 : you instead only want the value of 1Â to be 3
- Related mothur site
https://www.mothur.org/wiki/Classify.seqs
8. Remove non-bacteria sequences
The remove.lineage command reads the taxonomy file and taxon and generates a new file that contains only sequences without the taxon provided above.
- Command line
remove.lineage(fasta=example.unique.good.filter.unique.precluster.pick.fasta,name=example.unique.good.filter.unique.precluster.pick.names,taxonomy=example.unique.good.filter.unique.precluster.pick..taxonomy,taxon=Mitochondria-Chloroplast-Archaea-Eukaryota-unknown)
- Option description
fasta=path to the input fasta file
name=The name file
taxonomy=The taxonomy id file
taxon=The taxon parameter allows you to select the taxons you would like to remove, and is required
- Related mothur site
https://www.mothur.org/wiki/Remove.lineage
9. Calculate uncorrected pairwise distances
The dist.seqs command will calculate uncorrected pairwise distances between aligned DNA sequences.
- Command line
dist.seqs(fasta=example.unique.good.filter.unique.precluster.pick.pick.fasta,cutoff=0.15)
- Option description
fasta=path to the input fasta file
cutoff=If you know that you are not going to form OTUs with distances larger than 0.15, you can tell mothur to not save any distances larger than 0.15.
- Related mothur site
https://www.mothur.org/wiki/Dist.seqs
10. Assign sequences to OTUs
Once a distance matrix gets read into mothur, the cluster command can be used to assign sequences to OTUs.
- Command line
cluster(column=example.unique.good.filter.unique.precluster.pick.pick.dist,name=example.unique.good.filter.unique.precluster.pick.pick.names)
- Option description
column=To read in a column-formatted distance matrix you must provide a filename for the name option.
name=The name file
- Related mothur site
https://www.mothur.org/wiki/Cluster
11. Classify OTUs
The classify.otu command is used to get a consensus taxonomy for an OTU.
- Command line
classify.otu(taxonomy=example.unique.good.filter.unique.precluster.pick..pick.taxonomy,list=example.unique.good.filter.unique.precluster.pick.pick.an.list,name=example.unique.good.filter.unique.precluster.pick.pick.names)
- Option description
taxonomy=taxonomy file
list=The list file result from cluster
name=The name file
- Related mothur site
https://www.mothur.org/wiki/Classify.otu
Written by Jimmy Lim (Dec 2016); Edited by Mikael Hwang (Jan 2017)