In this tutorial, we will work with you to compare human respiratory samples between diseased and healthy subjects to get you familiarized with EzBioCloud’s microbiome taxonomic profiling (MTP). The data used in this tutorial was published in 2014 by Hana Yi et al. Since it is a part of the EzBioCloud microbiome database, we will start by checking out this dataset into your MTP account.
If you don’t have an account, please register first.
At EzBioCloud’s main page, click [Apps], then Microbiome Taxonomic Profiling (MTP).
EzBioCloud DB contains many datasets that you may incorporate into your analysis. The database is constantly growing, including the Human Microbiome Project (HMP)‘s 8,048 MTPs of 19 body sites.
Please check out the “[Tutorial] Human respiratory infection” data set which contains 65 MTPs.
Because you checked out a tutorial set, there should be 65 MTPs with metadata tags. Since we are going to handle a large number of microbiome samples, metadata tags will play a critical role in organizing data and discovering biomarkers or differentially present taxa. We will start in the MTP’s browsing page below.
Now, we have 65 MTPs of respiratory swab samples. Their bacterial community structure was elucidated by the amplification and sequencing of the 16S V1-V3 region using a Roche 454 platform. Because we used EzBioCloud’s 16S database, species-level identification of each sequencing read is possible. Each MTP contains the complete information of a sample’s bacterial community. Open the MTP named AD1 (MTP ID=CL123S1) by clicking [Open]. By doing this, a new web page/tab will be opened to explore the bacterial community of this sample. According to the tags, it is a respiratory swab sample of a male infant who has an adenovirus infection. In the browser for this MTP, you will find the following information about the AD1 sample under the “About MTP” tab:
Under the “Alpha diversity” tab, you should be able to obtain the species richness (the estimated number of species in a sample) and species evenness (or diversity index). Under the “Taxonomic Hierarchy” tab, 16 reads are displayed along with their taxonomic hierarchical structure.
Now, we have a fully expanded taxonomic hierarchy. Let’s say that we wanted to examine a species further. Using Actinomyces graevenitzii as an example, select “Actinomyces graevenitzii” which is under Actinobacteria (phylum); Actinobacteria_c (class); Actinomycetales (order); Actinomycetaceae (family); Actinomyces (genus).
Under the “Taxonomic composition” tab, quantitative compositions of all taxonomic ranks (phylum to species) are given as tables and pie charts. You also have the option of downloading the data as an Excel file for your own analyses. In the AD1 sample, the most abundant species was “Streptococcus salivarius group” (14.6%), followed by “Streptococcus pneumoniae group” (10.8%) and Veillonella dispar (8.6%). A “taxonomic group” contains multiple species/subspecies that can not be differentiated by 16S due to the low resolution of the gene. Click “Streptococcus salivarius group” to go to the web page describing the member species of this group.
Under the “Krona” tab, taxonomic compositional data are loaded onto the Krona tool, which is an open source visualization project available at //sourceforge.net/p/krona/home/krona/. Any MTP can be opened and browsed in the ways described above.
One of the major goals of microbiome studies is to understand the taxonomic profiles of a set of samples. An MTP set is defined as a set of MTPs in EzBioCloud. You can easily create sets manually, or semi-automatically using the (metadata) tags. In this tutorial set, we can create sets of different combinations of tags. Let’s create two sets that are defined as “Healthy” and “Diseased”, respectively.
In the same way, create an MTP set with “Diseased” tag. This set should contain 28 MTPs (see below).
To browse the microbiome information of an MTP set, move the mouse cursor to the box representing sets (a,b of the above screen shot).
An MTP set contains multiple MTPs that can be considered as a set in many statistical analyses. We have already created two MTP sets, “healthy” and “diseased”, in the previous exercise. Let’s open the “healthy” set first, which contains 37 16S based taxonomic profiles of respiratory tract swabs of 37 healthy subjects. Bring up the list of MTP sets ((a) in the previous screen shot) and move the cursor to the “healthy” set. You should see the [Open] button which will lead you to the “MTP set browser” (see screenshot below).
The image below is a screen shot of the MTP set browser showing the “healthy” set
There are four menu items in the main menu:
Select the “Composition” menu to the “Stacked bar” chart
In the healthy subjects, it looks like Firmicutes is most abundant at the phylum level. At the species level, “Streptococcus pneumoniae group” seems to be the most abundant. Let’s confirm this by viewing the “Double pie chart”. In this chart, the averaged taxonomic compositions of two taxonomic ranks of your choice are given.
Under the “alpha diversity” menu, various diversity indices are given for all MTPs in the set. For example, “Good’s coverage of library” indices for all MTPs are close to 100%, indicating that the numbers of sequencing reads per sample were statistically sufficient (below screen shot).
Under the “alpha diversity” menu, ACE, Chao1, and Jackknife give the estimated number of species, called species richness, in the samples. Under the “ACE” tab, the estimated number of species (=OTU) in each MTP are listed as a bar chart.
Under the “beta diversity” menu, relationships among samples are explored using different statistical and visualization methods. The most popular distance metric between two MTPs is a “UniFrac”. All distances are calculated for each pair, which are then used to carry out a hierarchical clustering or dimension reduction by principal coordinate analysis (PCoA). The below is the UPGMA clustering of MTPs in the “healthy” set using UniFrac distances:
In this dataset, we have three types of “healthy” subjects: Com-X-ND (Community subject, non-diseased), Hos-X-ND (Hospital staffs, non-diseased) and ICU-X-ND (intensive care unit staffs, non-diseased). In the above UPGMA dendrogram, we do not see a clear separation among the three groups. Because hierarchical clustering sometimes produces a biased result, we can confirm this by using an ordination method by PCoA. Please select the “PCoA (2D)” tab to view the 2-dimensional scattergram of 37 healthy subjects. By selecting a tag or multiple tags, we can highlight the MTPs with a certain combination of tags.
Our ultimate goal is to discover biomarkers that differentiate microbiomes with different characteristics. In this example, we want to know the bacterial species that are differentially present in “diseased” subjects. To do this, we start with the two MTP sets that we’ve already created, labelled “healthy” and “diseased”. To begin the comparative analysis, follow the steps below:
The module for the “Comparative MTP Analyzer” is very similar to the “MTP Set Browser”; the former will compare the statistics of sets whereas the latter will focus on the individual MTPs in a set.
It is well known that the number of species and species diversity are reduced in the diseased. Let’s find out if this is true. Please select the Alpha-diversity menu to see the below:
Okay, so now we know that there are more species in the healthy subjects. How about diversity or how evenly species are distributed in the swab samples? To check this, go to the “Diversity Index” tab.
If you start the “Beta-diversity” menu of “Comparative MTP Analyzer”, two tabs will appear. The “UPGMA clustering” will give you a dendrogram containing all MTPs in both the “healthy” and “diseased” sets (see below).
Finally, we want to mine our data to find what the major differences are between the two conditions. EzBioCloud provides multiple methods for discovering biomarkers. Here we will use the “Kruskal–Wallis H test” to find who is associated with respiratory diseases. Please select the “Biomarker Discovery” menu.
There are almost infinite ways of exploring microbiome data if you are equipped with proper bioinformatics tools and computational infrastructure. In EzBioCloud’s cloud environment, we try to provide instantly responding tools for comparative analysis, visualization, and data mining. We hope you enjoyed this tutorial and that you’ve familiarized yourself with EzBioCloud’s unique user interface.
Last updated Feb 5, 2019