Below you will find the reference to the underline algorithms used through the EzBioCloud and TrueBac ID family of tools and applications. If you cannot find the particular algorithm or documentation you are searching for please reach out to us and we will get back to you as soon as we can.
ACE
ACE is an indicator of species richness (total number of species in a sample) that is sensitive to rare OTUs (singletons and doubletons). Higher values indicate higher diversity.
Reference
Chao, A., and Lee, S.-M. “Estimating the number of classes via sample coverage.” Journal of the American statistical Association 87.417 (1992): 210-217.
Chao1
Chao1 is an indicator of species richness (total number of species in a sample) that is sensitive to rare OTUs (singletons and doubletons). Higher values indicate higher diversity.
Reference
Chao, A. “Estimating the population size for capture-recapture data with unequal catchability.” Biometrics (1987): 783-791.
Clone
A clone is an individual sequence that was not included in contigs.
Contig
A contig is a set of identical and sometimes overlapping sequences that together represent a consensus region of DNA
Diversity indices
Diversity indices are measures of species diversity, based on the number and pattern of OTUs observed in the sample. The indices include statistical estimates of species richness (Ace, Chao, Jackknife), and estimates of species evenness (Shannon, Simpson, NPShannon).
Good coverage of library (%)
This is an index of the extent to which the number of sequencing reads used for analysis represents the actual species population of the sample. The value can range from 0 to 100%, with 100% indicating a complete sampling of species, meaning that additional sequencing is unlikely to find any more new species.
Reference
Good, I. J. “The population frequencies of species and the estimation of population parameters.” Biometrika (1953): 237-264
Jackknife
Jackknife is an indicator of species richness (total number of species in a sample) that is sensitive to rare OTUs (singletons and doubletons) as well as to abundant OTUs (tripletons and more). Higher values indicate higher diversity.
Reference
Burnham, K. P. & Overton, W. S. (1979) Robust estimation of population size when capture probabilities vary among animals. Ecology, 60, 927-936.
No. of OTUs found in the sample
Operational Taxonomic Unit (OTU) is a group of sequences clustered by sequence similarity. Because many bacterial species exhibit greater than 97% sequence similarity with other species, OTU count doesn’t necessarily equate to the actual number of different species. This value represents the number of OTUs observed during experimentation, and may be different from the total number of OTUs (Species richness) in the sample.
NPShannon
NPShannon is an indicator of species evenness (proportional distribution of the number of each species in a sample) that estimates diversity when there are unseen species and unknown abundance. Values are greater than 0, and higher values indicate higher diversity.
Reference
Magurran, A. E. (2013). Measuring biological diversity. John Wiley & Sons.
OTU-Cutoff
This is the sequence similarity value used for OTU calculation, species-level identification against the reference database, and de novo clustering. 97% is commonly used for Bacteria.
OTU-picking Method
This section indicates what clustering method was used to form OTUs from sequenced reads. CL_OPEN_REF_UCLUST_MC2: each read is identified at the species-level against the reference database with a given similarity cutoff. Reads that fall below this cutoff are compiled and UCLUST is used to perform de novo clustering to generate additional OTUs. This strategy is called Open-reference OTU picking. Finally, OTUs with single reads (singletons) are omitted from further analysis.
Reference
* uclust : http://drive5.com/usearch/manual/uclust_algo.html
* cdhit : http://www.bioinformatics.org/cd-hit/
Rank abundance curve
The rank abundance graph can be used to observe species evenness. The x-axis represents the rank of OTUs, and the y-axis represents the relative abundance of OTUs at each rank. The graph converges to 0, and the steeper the slope of the curve, the lower the species diversity.
Rarefaction curve
The rarefaction curve is a graph that expresses species diversity by plotting the correlation between the size of the sample data and the number of OTUs.
The x-axis represents the number of sampled reads, and the y-axis represents the number of OTUs discovered. In general, as the number of reads increases, the number of OTUs converges to the maximum value.
The steeper the slope of the curve, the higher the species diversity.
Reference
Heck, K. L., van Belle, G., & Simberloff, D. (1975). Explicit calculation of the rarefaction diversity measurement and the determination of sufficient sample size. Ecology, 56(6), 1459-1461.
Shannon
Shannon is an indicator of species evenness (proportional distribution of the number of each species in a sample) that exhibits values greater than 0. Higher values indicate higher diversity, and the maximum value is achieved when all species are present in equal numbers.
Reference
Magurran, A. E. (2013). Measuring biological diversity. John Wiley & Sons.
Simpson
Simpson is an indicator of species evenness (proportional distribution of the number of each species in a sample) that displays the probability that two randomly selected sequences are of the same species. Values range from 0 to 1, and lower values indicate higher diversity.
Reference
Magurran, A. E. (2013). Measuring biological diversity. John Wiley & Sons.