Quality-filtering for 16S Microbiome Taxonomic Profiling
For 16S microbiome taxonomic profiling, the following criteria are used to filter out sequencing reads with low quality:
- Sequences with the lengths of <100 bp or >2,000 bp
- Averaged Q value is <25.
- Not predicted as a 16S gene by the Hidden Markov Model (HMM) based search.
- Sequences are first assigned to the reference 16S database. All sequences that do not match any of reference sequences with at least 97% similarity cutoff are clustered using UCLUST method using 97% the cutoff. If a sequence is found to be a singleton, we assume that it is an erroneous one that should be excluded in the subsequent analyses. This algorithm is widely used, especially for Illumina short read sequencing [See QIIME manual’s step 5].
The EzBioCloud team / Last edited on Feb. 19, 2018