Quality-filtering for 16S Microbiome Taxonomic Profiling

For 16S microbiome taxonomic profiling, the following criteria are used to filter out sequencing reads with low quality:

  • Sequences with the lengths of <100 bp or >2,000 bp
  • Averaged Q value is <25.
  • Not predicted as a 16S gene by the Hidden Markov Model (HMM) based search.
  • Sequences are first assigned to the reference 16S database. All sequences that do not match any of reference sequences with at least 97% similarity cutoff are clustered using UCLUST method using 97% the cutoff. If a sequence is found to be a singleton, we assume that it is an erroneous one that should be excluded in the subsequent analyses. This algorithm is widely used, especially for Illumina short read sequencing [See QIIME manual’s step 5].

The EzBioCloud team / Last edited on Feb. 19, 2018