使用ampvis2包对16S rRNA扩增子数据可视化分析

作者: 你猜我菜不菜 | 来源:发表于2019-05-12 17:14 被阅读2次

使用ampvis2包对16S rRNA扩增子数据可视化分析
扩增子分析：16s rRNA分析snakemake流程
[Metagenome-1] Metagenomic Data
微生物多样性（扩增子/16S rDNA测序）—功能预测分析方法描
R包：phyloseq: 扩增子统计分析利器
全长16S：细菌的“第二代身份证”
Qiime1-17.PICRUSt-16S预测宏基因组？
16s α多样性指数计算和可视化
扩增子分析
微生物组16S rRNA数据分析小结：raw data sequ

该教程来自ampvis2官方网站。

1. ampvis2包的工作流程

2. ampvis2包安装

我是使用的Windows10系统下的R 3.5.3和RStudio，同时需要安装RTools。打开Rstudio后运行以下代码进行安装：

> install.packages("remotes")
> remotes::install_github("MadsAlbertsen/ampvis2")

3. 载入数据

ampvis2能够接受常用的扩增子分析流程 QIIME, mothur以及 UPARSE产出的结果文件。 QIIME和mothur (通过 make.biom 代码) 都能以BIOM文件格式导出数据结果，然后通过ampvis2包中的amp_import_biom功能将数据导入到R中。UPARSE产出的数据可以通过ampvis2包中的 amp_import_uparse功能将数据导入到R中。示例数据为minimal example data。

#下载示例数据解压后有两个csv文件，设置一个ampvis2_test工作目录，将csv文件放到该目录下。
> setwd("C:/Users/Administrator/Desktop/R_work/ampvis2_test")
> list.files()
[1] "example_metadata.csv" "example_otutable.csv"
> myotutable <- read.csv("example_otutable.csv", check.names = FALSE)
> mymetadata <- read.csv("example_metadata.csv", check.names = FALSE)
> library(ampvis2)
> d <- amp_load(otutable = myotutable,
              metadata = mymetadata)

4. 数据过滤和子数据集构建

ampvis2包中包含有一个大的示例数据集MiDAS，这个数据集是在2006-2013年期间从污水处理植物后取样573个活性污泥的扩增子样品。

> data("MiDAS")
> MiDAS
ampvis2 object with 5 elements. 
Summary of OTU table:
     Samples         OTUs  Total#Reads    Min#Reads    Max#Reads Median#Reads    Avg#Reads 
         658        14969     20890850        10480        46264        31800     31749.01 

Assigned taxonomy:
      Kingdom        Phylum         Class         Order        Family         Genus       Species 
  14969(100%) 14477(96.71%) 12737(85.09%) 11470(76.63%)  9841(65.74%)   7380(49.3%)     28(0.19%) 

Metadata variables: 5 
 SampleID, Plant, Date, Year, Period

#查看数据集中的序列
> MiDAS$refseq
14969 DNA sequences in binary format stored in a list.

Mean sequence length: 472.922 
   Shortest sequence: 425 
    Longest sequence: 525 

Labels:
OTU_1
OTU_2
OTU_3
OTU_4
OTU_5
OTU_6
...

Base composition:
    a     c     g     t 
0.261 0.225 0.319 0.194 
(Total: 7.08 Mb)

#从MiDAS中选取plant变量中的Aalborg West和Aalborg East数据作为子数据集
> MiDASsubset <- amp_subset_samples(MiDAS, Plant %in% c("Aalborg West", "Aalborg East"))
590 samples and 5512 OTUs have been filtered 
Before: 658 samples and 14969 OTUs
After: 68 samples and 9457 OTUs

#或者选取更为复杂的子数据集
> MiDASsubset <- amp_subset_samples(MiDAS, Plant %in% c("Aalborg West", "Aalborg East") & !SampleID %in% c("16SAMP-749"), minreads = 10000)
591 samples and 5539 OTUs have been filtered 
Before: 658 samples and 14969 OTUs
After: 67 samples and 9430 OTUs

5. 热图绘制

简单热图，只展示了Aalborg West和Aalborg East两个不同实验组中相对丰度最高的10个门水平的微生物群落。

> amp_heatmap(MiDASsubset, group_by = "Plant")

复杂热图，展示了两个实验组中样品取样时间，相对丰度最高的25个属水平的微生物群落并且标明了对应的门分类水平，相对丰度高低的图例等信息。

> amp_heatmap(MiDASsubset,
             group_by = "Plant",
             facet_by = "Year",
             tax_aggregate = "Genus",
             tax_add = "Phylum",
             tax_show = 25,
             color_vector = c("white", "darkred"),
             plot_colorscale = "sqrt",
             plot_values = FALSE) +
     theme(axis.text.x = element_text(angle = 45, size=10, vjust = 1),
           axis.text.y = element_text(size=8),
           legend.position="right")

6. 箱线图绘制

简单箱线图，子数据集中所有样品的属水平微生物群落按照相对丰度均值高低排序。

> amp_boxplot(MiDASsubset)

复杂箱线图，根据不同的采用季节分组显示，添加属分类水平群落对应的门分类水平信息。

> amp_boxplot(MiDASsubset,
             group_by = "Period",
             tax_show = 5,
             tax_add = "Phylum")

7. 排序分析Ordination analysis

基于Bray-Curtis距离矩阵的主坐标分析PCoA

> amp_ordinate(MiDASsubset, 
              type = "pcoa",
              distmeasure = "bray",
              sample_color_by = "Plant",
              sample_colorframe = TRUE,
              sample_colorframe_label = "Plant") + theme(legend.position = "blank")

添加取样时间信息，点之间的连线标明取样时间轨迹。

> amp_ordinate(MiDASsubset, 
              type = "pcoa",
              distmeasure = "bray",
              sample_color_by = "Plant",
              sample_colorframe_label = "Plant", 
              sample_trajectory = "Date", 
              sample_trajectory_group = "Plant")

典型对应分析CCA，展示了样品不同季节的分布情况。

> ordinationresult <- amp_ordinate(MiDASsubset, 
                                  type = "CCA",
                                  constrain = "Period",
                                  transform = "Hellinger",
                                  sample_color_by = "Period",
                                  sample_shape_by = "Plant",
                                  sample_colorframe = TRUE,
                                  sample_colorframe_label = "Period",
                                  detailed_output = TRUE)
> ordinationresult$plot