美文网首页R语言绘图相关性R ggplot
R 数据可视化 —— 聚类热图 ComplexHeatmap(五

R 数据可视化 —— 聚类热图 ComplexHeatmap(五

作者: 名本无名 | 来源:发表于2021-05-18 20:36 被阅读0次

    前言

    oncoprint 是一种通过热图的方式来可视化多个基因组变异事件。ComplexHeatmap 包提供了 oncoPrint() 函数来绘制这种类型的图。

    默认的样式是 cBioPortal 格式,我们也可以根据需要不同类型的图形

    常规设置

    1. 输入数据格式

    输入数据可以有两种格式:矩阵和矩阵列表

    1.1 矩阵

    对于矩阵类型的数据,行代表的是基因,列表示的是样本,矩阵的值表示基因在样本中方式的变异类型。例如

    mat <- read.table(
      textConnection(
        "s1,s2,s3
        g1,snv;indel,snv,indel
        g2,,snv;indel,snv
        g3,snv,,indel;snv"
      ),
      row.names = 1,
      header = TRUE,
      sep = ",",
      stringsAsFactors = FALSE
    )
    mat <- as.matrix(mat)
    

    对于这种字符型矩阵,还需要定义相应的变异类型提取函数,例如

    > get_type_fun <- function(x) unlist(strsplit(x, ";"))
    > get_type_fun(mat[1,1])
    [1] "snv"   "indel"
    

    如果变异类型的编码方式为 snv|indel,只需把函数定义为按 | 分割就行

    get_type_fun <- function(x) unlist(strsplit(x, "|"))
    

    然后将该函数传递给 oncoPrint() 函数的 get_type 参数。

    对于常见的分隔符:;:,|oncoPrint 会自动解析,不需要指定解析函数

    alter_fun 参数可以自定义每种变异类型在热图单元格中的绘制函数,函数接受 4 个参数,其中 xy 用于标识格子位置,wh 用于标识格子的大小,并使用 col 来标注颜色

    col = c(snv = "#fb8072", indel = "#80b1d3")
    oncoPrint(
      mat, alter_fun = list(
        snv = function(x, y, w, h) 
          grid.rect(
            x, y, w*0.9, h*0.9,
            gp = gpar(fill = col["snv"],col = NA)),
        indel = function(x, y, w, h) 
          grid.rect(
            x, y, w*0.9, h*0.4,
            gp = gpar(fill = col["indel"], col = NA))
        ), 
      col = col
    )
    

    注意:如果 alter_fun 设置为列表,元素的顺序会影响图形绘制顺序,先定义的先绘制

    1.2 矩阵列表

    第二种格式是矩阵列表,列表中的每种突变类型对应一个矩阵,矩阵仅包含 0-1 值,用于标识基因在样本中是否发生了这种类型的突变,并且列表名称与变异类型对应

    > mat_list <- list(
    +   snv = matrix(c(1, 0, 1, 1, 1, 0, 0, 1, 1), nrow = 3),
    +   indel = matrix(c(1, 0, 0, 0, 1, 0, 1, 0, 0), nrow = 3))
    > 
    > rownames(mat_list$snv) <- rownames(mat_list$indel) <- c("g1", "g2", "g3")
    > colnames(mat_list$snv) <- colnames(mat_list$indel) <- c("s1", "s2", "s3")
    > mat_list
    $snv
       s1 s2 s3
    g1  1  1  0
    g2  0  1  1
    g3  1  0  1
    
    $indel
       s1 s2 s3
    g1  1  0  1
    g2  0  1  0
    g3  0  0  0
    

    需要保证所有矩阵具有相同的行名和列名

    col = c(snv = "#fb8072", indel = "#80b1d3")
    oncoPrint(
      mat_list, alter_fun = list(
        snv = function(x, y, w, h) 
          grid.rect(
            x, y, w*0.9, h*0.9,
            gp = gpar(fill = col["snv"],col = NA)),
        indel = function(x, y, w, h) 
          grid.rect(
            x, y, w*0.9, h*0.4,
            gp = gpar(fill = col["indel"], col = NA))
        ), 
      col = col
    )
    

    2. 定义 alter_fun

    alter_fun 不仅可以传递一个函数列表,还可以传递一个函数,该函数多了一个参数,用于传递一个逻辑值向量,用于标识当前基因在当前样本中是否发生了对应的变异

    oncoPrint(
      mat, alter_fun = function(x, y, w, h, v) {
        if (v["snv"])
          grid.rect(x, y, w * 0.9, h * 0.9, 
                    gp = gpar(fill = col["snv"], col = NA))
        if (v["indel"])
          grid.rect(x, y, w * 0.9, h * 0.4,
                    gp = gpar(fill = col["indel"], col = NA))
      }, 
      col = col
    )
    

    设置为单个函数,可以更灵活进行自定义

    oncoPrint(
      mat, alter_fun = function(x, y, w, h, v) {
        n = sum(v) # 发生变异的数量
        h = h * 0.9
        if (n)
          grid.rect(x,
                    y - h * 0.5 + 1:n / n * h,
                    w * 0.9,
                    1 / n * h,
                    gp = gpar(fill = col[names(which(v))], col = NA),
                    just = "top")
      }, col = col)
    

    设置为三角形填充

    oncoPrint(
      mat,
      alter_fun = list(
        # 控制背景的绘制,通常在放在第一个
        background = function(x, y, w, h) {
          grid.polygon(
            unit.c(x - 0.5 * w, x - 0.5 * w, x + 0.5 * w),
            unit.c(y - 0.5 * h, y + 0.5 * h, y - 0.5 * h),
            gp = gpar(fill = "grey", col = "white")
          )
          grid.polygon(
            unit.c(x + 0.5 * w, x + 0.5 * w, x - 0.5 * w),
            unit.c(y + 0.5 * h, y - 0.5 * h, y + 0.5 * h),
            gp = gpar(fill = "grey", col = "white")
          )
        },
        snv = function(x, y, w, h) {
          grid.polygon(
            unit.c(x - 0.5 * w, x - 0.5 * w, x + 0.5 * w),
            unit.c(y - 0.5 * h, y + 0.5 * h, y - 0.5 * h),
            gp = gpar(fill = col["snv"], col = "white")
          )
        },
        indel = function(x, y, w, h) {
          grid.polygon(
            unit.c(x + 0.5 * w, x + 0.5 * w, x - 0.5 * w),
            unit.c(y + 0.5 * h, y - 0.5 * h, y + 0.5 * h),
            gp = gpar(fill = col["indel"], col = "white")
          )
        }
      ),
      col = col
    )
    

    在上面的例子中,我们添加了一个背景设置 background。背景需要放置第一个,如果想要删除背景,可以设置

    background = function(...) NULL
    

    在某些情况下,我们可能需要设置的变异类型较多,为了确保我们 alter_fun 设置正确,可以使用 test_alter_fun() 函数来进行测试。例如

    alter_fun <- list(
      mut1 = function(x, y, w, h) 
        grid.rect(x, y, w, h, gp = gpar(fill = "red", col = NA)),
      mut2 = function(x, y, w, h) 
        grid.rect(x, y, w, h, gp = gpar(fill = "blue", col = NA)),
      mut3 = function(x, y, w, h) 
        grid.rect(x, y, w, h, gp = gpar(fill = "yellow", col = NA)),
      mut4 = function(x, y, w, h) 
        grid.rect(x, y, w, h, gp = gpar(fill = "purple", col = NA)),
      mut5 = function(x, y, w, h) 
        grid.rect(x, y, w, h, gp = gpar(fill = NA, lwd = 2)),
      mut6 = function(x, y, w, h) 
        grid.points(x, y, pch = 16),
      mut7 = function(x, y, w, h) 
        grid.segments(x - w*0.5, y - h*0.5, x + w*0.5, y + h*0.5, gp = gpar(lwd = 2))
    )
    test_alter_fun(alter_fun)
    

    3. 简化 alter_fun

    如果只要绘制简单图形,如 矩形和散点图,可以使用 alter_graphic() 函数

    oncoPrint(
      mat,
      alter_fun = list(
        snv = alter_graphic(
          "rect",
          width = 0.9,
          height = 0.9,
          fill = col["snv"]
        ),
        indel = alter_graphic(
          "rect",
          width = 0.9,
          height = 0.4,
          fill = col["indel"]
        )
      ),
      col = col
    )
    

    4. 复杂变异类型

    大多数时候,我们需要展示的变异类型并不是单单一两种,可能会有很多种,如果单单用颜色来区分的话比较困难。

    而且,有些变异类型是我们比较关注的,而其他的一些次要的变异类型没那么重要,就有一个主次关系。

    例如,snvindel 变异类型中又包含 intronic snvexonic snvintronic indelexonic indel。主分类应该是 snvindel,次分类是 intronicexonic

    所以,我们可以为主分类设置同样类型的图形,比如说,设置不同的颜色来区分;而次分类设置为不同的符号类型。

    对于下面的数据

    type <- c("snv;intronic", "snv;exonic", "indel;intronic", "indel;exonic", "")
    m <- matrix(
      sample(type, size = 100, replace = TRUE),
      nrow = 10, ncol = 10,
      dimnames = list(paste0("g", 1:10), paste0("s", 1:10))
      )
    

    定义 alter_fun

    alter_fun <- list(
      # 设置背景
      background = function(x, y, w, h) 
        grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "#CCCCCC", col = NA)),
      # SNV 颜色
      snv = function(x, y, w, h) 
        grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "#fb8072", col = NA)),
      # indel 颜色
      indel = function(x, y, w, h) 
        grid.rect(x, y, w*0.9, h*0.9, gp = gpar(fill = "#80b1d3", col = NA)),
      # 内含子设置为点
      intronic = function(x, y, w, h) 
        grid.points(x, y, pch = 16),
      # 外显子设置为 X
      exonic = function(x, y, w, h) {
        grid.segments(x - w*0.4, y - h*0.4, x + w*0.4, y + h*0.4, gp = gpar(lwd = 2))
        grid.segments(x + w*0.4, y - h*0.4, x - w*0.4, y + h*0.4, gp = gpar(lwd = 2))
      }
    )
    

    绘制

    oncoPrint(m, alter_fun = alter_fun, col = c(snv = "#fb8072", indel = "#80b1d3"))
    

    5. 其他参数设置

    oncoPrint 本质上也是热图,所以很多热图的参数都可以使用,例如,显示列名

    alter_fun <- list(
      snv = function(x, y, w, h)
        grid.rect(x, y, w * 0.9, h * 0.9,
                  gp = gpar(fill = col["snv"], col = NA)),
      indel = function(x, y, w, h)
        grid.rect(x, y, w * 0.9, h * 0.4,
                  gp = gpar(fill = col["indel"], col = NA))
    )
    
    oncoPrint(
      mat, alter_fun = alter_fun, 
      col = col, show_column_names = TRUE
    )
    

    行名和百分比文本的显示可以使用 show_pctshow_row_names,位置可以使用 pct_siderow_names_side 设置,百分比精确度可以使用 pct_digits

    oncoPrint(
      mat,
      alter_fun = alter_fun,
      col = col,
      row_names_side = "left",
      pct_side = "right",
      pct_digits = 2
    )
    

    使用 anno_oncoprint_barplot() 注释函数来控制条形图

    oncoPrint(
      mat,
      alter_fun = alter_fun,
      col = col,
      top_annotation = HeatmapAnnotation(cbar = anno_oncoprint_barplot(height = unit(1, "cm"))),
      right_annotation = rowAnnotation(rbar = anno_oncoprint_barplot(
        width = unit(4, "cm"),
        axis_param = list(
          at = c(0, 2, 4),
          labels = c("zero", "two", "four"),
          side = "top",
          labels_rot = 0
        )
      )),
    )
    

    或者,把右边的条形图往左边放放

    oncoPrint(
      mat,
      alter_fun = alter_fun,
      col = col,
      left_annotation =  rowAnnotation(rbar = anno_oncoprint_barplot(axis_param = list(direction = "reverse"))),
      right_annotation = NULL
    )
    

    应用实例

    我们使用 ComplexHeatmap 包中提供的数据,该数据来自于 cBioPortal 数据库

    mat <- read.table(
       system.file(
         "extdata",
         package = "ComplexHeatmap",
         "tcga_lung_adenocarcinoma_provisional_ras_raf_mek_jnk_signalling.txt"
       ),
       header = TRUE,
       stringsAsFactors = FALSE,
       sep = "\t"
     )
    mat[is.na(mat)] <- ""
    rownames(mat) <- mat[, 1]
    mat <- mat[,-1]
    mat <-  mat[,-ncol(mat)]
    mat <- t(as.matrix(mat))
    

    该数据包含 Ras-Raf-MEK-Erk/JNK signaling 通路中的 26 个基因在 172 个肺腺癌样本中的突变即 CNV 变异信息

    > mat[1:3,1:3]
         TCGA-05-4384-01 TCGA-05-4390-01 TCGA-05-4425-01
    KRAS "  "            "MUT;"          "  "           
    HRAS "  "            "  "            "  "           
    BRAF "  "            "  "            "  " 
    

    数据中包含 3 种变异:MUTAMPHOMDEL,现在,我们为每种变异类型定义图形

    col <- c("HOMDEL" = "#ff7f00", "AMP" = "#984ea3", "MUT" = "#4daf4a")
    alter_fun = list(
      background = alter_graphic("rect", fill = "#CCCCCC"),
      HOMDEL = alter_graphic("rect", fill = col["HOMDEL"]),
      AMP = alter_graphic("rect", fill = col["AMP"]),
      MUT = alter_graphic("rect", height = 0.33, fill = col["MUT"])
    )
    

    我们只是设置格子的颜色,所以可以使用 alter_graphic 来设置

    设置列标题和图例

    column_title <- "OncoPrint for TCGA Lung Adenocarcinoma, genes in Ras Raf MEK JNK signalling"
    heatmap_legend_param <-
      list(
        title = "Alternations",
        at = c("HOMDEL", "AMP", "MUT"),
        labels = c("Deep deletion", "Amplification", "Mutation")
      )
    

    绘制图片

    oncoPrint(
      mat, alter_fun = alter_fun, col = col,
      column_title = column_title,
      heatmap_legend_param = heatmap_legend_param
    )
    

    我们可以看到,有很多空白的行和列,删掉它们

    oncoPrint(
      mat, alter_fun = alter_fun, col = col,
      remove_empty_columns = TRUE, 
      remove_empty_rows = TRUE,
      column_title = column_title,
      heatmap_legend_param = heatmap_legend_param
    )
    

    row_ordercolumn_order 可以设置行、列的顺序

    oncoPrint(
      mat, alter_fun = alter_fun, col = col,
      column_title = column_title,
      row_order = 1:nrow(mat),
      remove_empty_columns = TRUE, 
      remove_empty_rows = TRUE,
      heatmap_legend_param = heatmap_legend_param
    )
    

    我们可以使用 anno_oncoprint_barplot() 来修改条形图注释,且条形图默认都是显示变异的数量,可以在设置 show_fraction = TRUE 来显示频率

    oncoPrint(
      mat,
      alter_fun = alter_fun,
      col = col,
      # 上方条形图只显示 MUT 的频率
      top_annotation = HeatmapAnnotation(
        column_barplot = anno_oncoprint_barplot(
          "MUT", border = TRUE,
          show_fraction = TRUE,
          height = unit(4, "cm")
      )),
      # 右侧条形图显示 AMP 和 HOMDEL
      right_annotation = rowAnnotation(
        row_barplot = anno_oncoprint_barplot(
          c("AMP", "HOMDEL"),
          border = TRUE,
          height = unit(4, "cm"),
          axis_param = list(side = "bottom", labels_rot = 90)
      )),
      remove_empty_columns = TRUE,
      remove_empty_rows = TRUE,
      column_title = column_title,
      heatmap_legend_param = heatmap_legend_param
    )
    

    类似于热图,我们可以使用 HeatmapAnnotation()rowAnnotation() 来添加行列注释

    oncoPrint(
      mat,
      alter_fun = alter_fun,
      col = col,
      remove_empty_columns = TRUE,
      remove_empty_rows = TRUE,
      top_annotation = HeatmapAnnotation(
        cbar = anno_oncoprint_barplot(),
        foo1 = 1:172,
        bar1 = anno_points(1:172)
      ),
      left_annotation = rowAnnotation(foo2 = 1:26),
      right_annotation = rowAnnotation(bar2 = anno_barplot(1:26)),
      column_title = column_title,
      heatmap_legend_param = heatmap_legend_param
    )
    

    起始 oncoPrint() 返回的是 Heatmap 对象,所以,我们可以在水平或竖直方向上添加热图或注释

    ht_list <- oncoPrint(
      mat,
      alter_fun = alter_fun,
      col = col,
      column_title = column_title,
      heatmap_legend_param = heatmap_legend_param
    ) +
      Heatmap(
        matrix(rnorm(nrow(mat) * 10), ncol = 10),
        name = "expr",
        col = colorRamp2(c(-3, 0, 3), c("#8c510a", "white", "#01665e")),
        width = unit(4, "cm")
      )
    draw(ht_list)
    

    相关文章

      网友评论

        本文标题:R 数据可视化 —— 聚类热图 ComplexHeatmap(五

        本文链接:https://www.haomeiwen.com/subject/jqgejltx.html