美文网首页下游数据分析
R语言分析4:一致性聚类分析(ConsensusClusterP

R语言分析4:一致性聚类分析(ConsensusClusterP

作者: 小程的学习笔记 | 来源:发表于2023-09-11 15:10 被阅读0次

    一致性聚类是一种提供定量证据的方法,用以确定数据集中可能的聚类的成员及数量,例如微阵列基因表达。这种方法在癌症基因组学中得到了广泛应用,以发现了新的疾病分子亚类。

    一致性聚类方法包括从一组项目(例如微阵列)中进行二次抽样,并确定聚类计数(k)的簇。然后,计算成对共识值,即两个项目在同一子样本中出现的次数中占据同一簇的比例,并将其存储在每个 k 的对称共识矩阵中。

    1. 准备输入数据

    # 安装并加载所需的R包
    # BiocManager::install("ConsensusClusterPlus")
    library(ConsensusClusterPlus)
    
    # 输入数据格式是一个矩阵,其中列是样本(项目),行是特征,单元格是数值(此处展示TCGA-STAD的FPKM数据)
    STAD_tumor[1:5,1:5]
    ##           TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31 TCGA-BR-6852-01A-11R-1884-13 TCGA-BR-7851-01A-11R-2203-13
    ## 5_8S_rRNA                       0.0866                       0.4991                       1.0460                       0.1388                       0.0965
    ## 5S_rRNA                         0.0000                       0.2676                       2.6896                       0.3327                       0.1393
    ## 7SK                             0.3533                       0.2546                       1.2531                       0.2813                       0.0000
    ## A1BG                            0.0197                       0.0358                       0.1067                       0.1153                       0.0406
    ## A1BG-AS1                        0.0546                       0.2324                       0.3320                       0.1367                       0.0525
    
    # 为了选择信息最丰富的基因进行类检测,将数据集减少到前 5,000 个可变性最大的基因(通过中值绝对偏差 - MAD来衡量)
    mads = apply(STAD_tumor, 1, mad) # MAD测度
    STAD_tumor = STAD_tumor[rev(order(mads))[1:5000], ] # 提取前5000个基因
    
    # 归一化操作,sweep是一个循环函数,首先用apply计算每行的中值,然后用每个基因在样本中的表达值减中值
    STAD_tumor = sweep(STAD_tumor, 1, apply(STAD_tumor, 1, median, na.rm = T))
    
    STAD_tumor[1:5,1:5]
    ##           TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31 TCGA-BR-6852-01A-11R-1884-13 TCGA-BR-7851-01A-11R-2203-13
    ## 5_8S_rRNA                     -0.04830                      0.36420                      0.91110                      0.00390                     -0.03840
    ## 5S_rRNA                       -0.36695                     -0.09935                      2.32265                     -0.03425                     -0.22765
    ## 7SK                            0.24415                      0.14545                      1.14395                      0.17215                     -0.10915
    ## A1BG                          -0.00930                      0.00680                      0.07770                      0.08630                      0.01160
    ## A1BG-AS1                      -0.05605                      0.12175                      0.22135                      0.02605                     -0.05815
    

    2. 运行ConsensusClusterPlus

    主要包含了以下几个参数:
    1)pItem, 样品的抽样比例,如 pItem=0.8 表示选择80%的样本进行重复抽样;
    2)pfeature, Feature的抽样比例;
    3)maxK, 聚类结果中分类的最大的K值,必须是整数;
    4)reps, 重复抽样的数目;
    5)clusterAlg, 使用的聚类算法,“hc”用于层次聚类,“pam”用于PAM(Partioning Around Medoids)算法,“km”用于K-Means算法,也可以自定义函数;
    6)distance, 计算距离的方法,有pearson、spearman、euclidean、binary、maximum、canberra、minkowski;
    7)title, 输出结果的文件夹名字,包含了输出的图片;
    8)seed, 随机种子,用于重复结果
    9)tmyPal,指定一致性矩阵使用的颜色,默认使用白-蓝色
    10)plot,不设置时图片结果仅输出到屏幕,也可以设置输出为'pdf', 'png', 'pngBMP'
    11)writeTable,若为TRUE,则将一致性矩阵、ICL、log输出到CSV文件

    title=tempdir()
    results = ConsensusClusterPlus(STAD_tumor, maxK = 20, reps = 1000, pItem = 0.8, pFeature = 1, title = "untitled_consensus_cluster", clusterAlg = "hc", 
                                   distance = "pearson", seed = 1262118388.71279, tmyPal=NULL, writeTable=FALSE, plot = "png")
    ## end fraction
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    ## clustered
    
    # ConsensusClusterPlus 的输出是一个列表,其中列表的元素对应于第 k 个集群的结果,例如 results[[8]] 就是 k =8 的结果 result。
    str(results[[8]])
    ## List of 5
    ##  $ consensusMatrix: num [1:412, 1:412] 1 0.00156 0.1442 0 0.00158 ...
    ##  $ consensusTree  :List of 7
    ##   ..$ merge      : int [1:411, 1:2] -1 -2 -30 -42 -43 -49 -50 -51 -59 -63 ...
    ##   ..$ height     : num [1:411] 0 0 0 0 0 0 0 0 0 0 ...
    ##   ..$ order      : int [1:412] 410 258 251 330 22 175 191 241 252 382 ...
    ##   ..$ labels     : NULL
    ##   ..$ method     : chr "average"
    ##   ..$ call       : language hclust(d = as.dist(1 - fm), method = finalLinkage)
    ##   ..$ dist.method: NULL
    ##   ..- attr(*, "class")= chr "hclust"
    ##  $ consensusClass : Named int [1:412] 1 2 3 4 2 3 3 2 3 3 ...
    ##   ..- attr(*, "names")= chr [1:412] "TCGA-BR-A4J4-01A-12R-A251-31" "TCGA-BR-A4IZ-01A-32R-A251-31" "TCGA-RD-A7C1-01A-11R-A32D-31" ## "TCGA-BR-6852-01A-11R-1884-13" ...
    ##  $ ml             : num [1:412, 1:412] 1 0.00156 0.1442 0 0.00158 ...
    ##  $ clrs           :List of 3
    ##   ..$ : chr [1:412] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" ...
    ##   ..$ : num 8
    ##   ..$ : chr [1:8] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" ...
    
    results[[8]][["consensusMatrix"]][1:5,1:5]
    ##             [,1]        [,2]        [,3]        [,4]        [,5]
    ## [1,] 1.000000000 0.001557632 0.144200627 0.000000000 0.001577287
    ## [2,] 0.001557632 1.000000000 0.000000000 0.165094340 0.591263651
    ## [3,] 0.144200627 0.000000000 1.000000000 0.001567398 0.000000000
    ## [4,] 0.000000000 0.165094340 0.001567398 1.000000000 0.569105691
    ## [5,] 0.001577287 0.591263651 0.000000000 0.569105691 1.000000000
    
    # 样本的聚类树
    results[[8]][["consensusTree"]]
    ## 
    ## Call:
    ## hclust(d = as.dist(1 - fm), method = finalLinkage)
    ## 
    ## Cluster method   : average 
    ## Number of objects: 412 
    
    # consensusClass, 样本的聚类结果
    length(results[[8]][["consensusClass"]])
    ## [1] 412
    results[[8]][["consensusClass"]][1:5]
    ## TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31 TCGA-BR-6852-01A-11R-1884-13 TCGA-BR-7851-01A-11R-2203-13 
    ##                            1                            2                            3                            4                            2 
    
    # ml, 就是consensusMatrix
    results[[8]][["ml"]][1:5,1:5]
    ##             [,1]        [,2]        [,3]        [,4]        [,5]
    ## [1,] 1.000000000 0.001557632 0.144200627 0.000000000 0.001577287
    ## [2,] 0.001557632 1.000000000 0.000000000 0.165094340 0.591263651
    ## [3,] 0.144200627 0.000000000 1.000000000 0.001567398 0.000000000
    ## [4,] 0.000000000 0.165094340 0.001567398 1.000000000 0.569105691
    ## [5,] 0.001577287 0.591263651 0.000000000 0.569105691 1.000000000
    results[[8]][["consensusMatrix"]][1:5,1:5]
    ##             [,1]        [,2]        [,3]        [,4]        [,5]
    ## [1,] 1.000000000 0.001557632 0.144200627 0.000000000 0.001577287
    ## [2,] 0.001557632 1.000000000 0.000000000 0.165094340 0.591263651
    ## [3,] 0.144200627 0.000000000 1.000000000 0.001567398 0.000000000
    ## [4,] 0.000000000 0.165094340 0.001567398 1.000000000 0.569105691
    ## [5,] 0.001577287 0.591263651 0.000000000 0.569105691 1.000000000
    
    # clrs, 颜色
    results[[8]][["clrs"]]
    ## [[1]]
    ##   [1] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#FF7F00" "#FF7F00" "#FF7F00" "#1F78B4"
    ##  [17] "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#E31A1C" "#A6CEE3" "#A6CEE3" "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00"
    ##  [33] "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#33A02C" "#1F78B4" "#1F78B4"
    ##  [49] "#A6CEE3" "#A6CEE3" "#A6CEE3" "#FF7F00" "#FB9A99" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#33A02C"
    ##  [65] "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FB9A99"
    ##  [81] "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4"
    ##  [97] "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FF7F00" "#A6CEE3" "#A6CEE3"
    ## [113] "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#FB9A99"
    ## [129] "#1F78B4" "#FF7F00" "#FF7F00" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3"
    ## [145] "#FF7F00" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FF7F00" "#1F78B4" "#1F78B4"
    ## [161] "#A6CEE3" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#B2DF8A" "#FF7F00" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#E31A1C" "#1F78B4"
    ## [177] "#FF7F00" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#FF7F00" "#1F78B4" "#FF7F00" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#FF7F00" "#FDBF6F" "#1F78B4"
    ## [193] "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FB9A99" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3"
    ## [209] "#FF7F00" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#A6CEE3" "#FF7F00" "#1F78B4" "#B2DF8A"
    ## [225] "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#1F78B4" "#FF7F00" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4"
    ## [241] "#FDBF6F" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#E31A1C" "#FDBF6F" "#1F78B4" "#1F78B4" "#B2DF8A" "#FF7F00"
    ## [257] "#FF7F00" "#E31A1C" "#1F78B4" "#A6CEE3" "#FF7F00" "#A6CEE3" "#33A02C" "#A6CEE3" "#A6CEE3" "#FF7F00" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3"
    ## [273] "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#FB9A99" "#1F78B4"
    ## [289] "#1F78B4" "#1F78B4" "#FF7F00" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#33A02C" "#1F78B4" "#B2DF8A" "#FF7F00" "#FF7F00" "#FF7F00"
    ## [305] "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4"
    ## [321] "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#E31A1C" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4" "#1F78B4"
    ## [337] "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#B2DF8A" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#1F78B4" "#A6CEE3" "#33A02C"
    ## [353] "#FF7F00" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#B2DF8A" "#1F78B4" "#A6CEE3" "#B2DF8A" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00"
    ## [369] "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#33A02C" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#A6CEE3" "#FDBF6F" "#1F78B4" "#1F78B4"
    ## [385] "#FF7F00" "#FF7F00" "#1F78B4" "#1F78B4" "#A6CEE3" "#1F78B4" "#1F78B4" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#A6CEE3" "#1F78B4" "#A6CEE3" "#1F78B4" "#FF7F00" "#A6CEE3"
    ## [401] "#1F78B4" "#A6CEE3" "#FF7F00" "#A6CEE3" "#1F78B4" "#B2DF8A" "#1F78B4" "#1F78B4" "#FF7F00" "#E31A1C" "#1F78B4" "#FF7F00"
    ## 
    ## [[2]]
    ## [1] 8
    ## 
    ## [[3]]
    ## [1] "#B2DF8A" "#A6CEE3" "#1F78B4" "#FF7F00" "#E31A1C" "#33A02C" "#FB9A99" "#FDBF6F"
    

    ★ 官网推荐,在实际运行中,推荐reps设置的更大,比如1000, maxK设置的更大,比如20

    3. 计算聚类一致性 (cluster-consensus) 和样品一致性 (item-consensus)

    主要包含了以下几个参数:
    1)title,设置生成的文件的路径
    2)plot,不设置时图片结果仅输出到屏幕,也可以设置输出为'pdf', 'png', 'pngBMP' 。
    3)writeTable,若为TRUE,则将一致性矩阵、ICL、log输出到CSV文件

    icl = calcICL(results, title = "consensus_cluster", plot = "png")
    
    dim(icl[["clusterConsensus"]])
    icl[["clusterConsensus"]][1:5,]
    ##       k cluster clusterConsensus
    ##  [1,] 2       1        0.8236517
    ##  [2,] 2       2        0.9204836
    ##  [3,] 3       1        0.6659016
    ##  [4,] 3       2        0.7503082
    ##  [5,] 3       3        0.8561029
    
    dim(icl[["itemConsensus"]])
    icl[["itemConsensus"]][1:5,]
    ##   k cluster                         item itemConsensus
    ##  1 2       1 TCGA-ZQ-A9CR-01A-11R-A39E-31     0.2776996
    ##  2 2       1 TCGA-BR-6563-01A-13R-2055-13     0.2786404
    ##  3 2       1 TCGA-IN-7808-01A-11R-2203-13     0.2941890
    ##  4 2       1 TCGA-VQ-A8P8-01A-11R-A39E-31     0.2783686
    ##  5 2       1 TCGA-HU-A4HB-01A-12R-A251-31     0.2714451
    

    4. 聚类结果展示

    4.1 矩阵热图

    ConsensusClusterPlus-1

    ꔷ 一致性矩阵的行和列表示的都是样本,一致性矩阵的值按从0(不可能聚类在一起)到1(总是聚类在一起)用白色到深蓝色表示
    ꔷ 一致性矩阵按照一致性聚类(热图上方的树状图)来排列。树状图和热图之间的彩色矩形即分出来的类别

    4.2 一致性累积分布函数(CDF)图

    ConsensusClusterPlus-2

    ꔷ 此图展示了k取不同数值时(用颜色表示)的一致性矩阵的累积分布函数,用于判断当k取何值时,CDF达到一个近似最大值,此时一致性和簇置信度在达到最大,聚类分析结果最可靠

    4.3 Delta Area Plot

    ConsensusClusterPlus-3

    ꔷ 此图展示了 k 和 k-1 相比CDF曲线下面积的相对变化。当k = 2时,因为没有k=1,所以第一个点表示的是k=2时CDF曲线下总面积,而当k = 10 时,曲线下面积趋近于平稳,所以10为最合适的k值。

    4.4 Tracking Plot

    ConsensusClusterPlus-4

    ꔷ 此图下方的黑色条纹表示样品,展示的是样品在k取不同的值时,归属的分类情况,不同颜色的色块代表不同的分类。取不同k值前后经常改变颜色分类的样品代表其分类不稳定。若分类不稳定的样本太多,则说明该k值下的分类不稳定。

    4.5 item-Consensus Plot(样本一致性图)

    ConsensusClusterPlus-5

    ꔷ 纵坐标代表Item-consensus values。k值不同时,每个样本都会有一个对应不同簇的item-consensus values。竖条代表每一个样本,竖条按从下到上递增的值进行排序,高度代表该样本的总item-consensus values。每个样本的顶部都有一个小矩形,小矩形的颜色代表该样本被分到了哪一簇。
    ꔷ 该图提供了给定k下所有其他集群的样本一致性图,可以查看样本是否是集群中非常“纯粹”的成员,或者它是否与多个集群共享高度一致性(多种颜色的列中的大矩形),表明它是不稳定或“不纯粹”的成员。 这些值可用于选择“核心”样本,它们高度代表集群。 此外,也可以帮助决策簇数量

    4.6 Cluster-Consensus Plot(聚类一致性图 )

    ConsensusClusterPlus-6

    ꔷ 此图展示的是不同k值下,每个分类的cluster-consensus value(该簇中成员pairwise consensus values的均值)。该值越高(低)代表稳定性越高(低)。可用于判断同一k值下以及不同k值之间cluster-consensus value的高低

    5. 最佳的K值的选择

    通过上面的结果解读我们可以选取出初步分类的亚组的个数,但是这种方法有时候带有主观性,也可以用PAC法确定最佳聚类数目

    #  面积最小值对应K为最佳K
    Kvec = 2:20
    x1 = 0.1; x2 = 0.9        # threshold defining the intermediate sub-interval
    PAC = rep(NA,length(Kvec)) 
    names(PAC) = paste("K=",Kvec,sep="")  # from 2 to maxK
    for(i in Kvec){
      M = results[[i]]$consensusMatrix
      Fn = ecdf(M[lower.tri(M)])          # M 为计算出共识矩阵
      PAC[i-1] = Fn(x2) - Fn(x1)
    } 
    
    optK = Kvec[which.min(PAC)]  # 理想的K值
    
    
    # 重新绘制热图(以k=7为例)
    ## 选取自己感兴趣的基因集
    intersection_geneExp[1:3,1:3]
    ##         TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31
    ## ALOX12B                       0.5224                       0.0031                       0.6993
    ## GOT1                         20.0073                       9.3511                      39.1501
    ## ALOX15B                       0.2185                       0.1652                       1.3996
    
    annCol <- data.frame(results = paste0("Cluster",
                                          results[[7]][['consensusClass']]),
                         row.names = colnames(intersection_geneExp))
                                      
    head(annCol)
    ##                               results
    ##  TCGA-BR-A4J4-01A-12R-A251-31 Cluster1
    ##  TCGA-BR-A4IZ-01A-32R-A251-31 Cluster1
    ##  TCGA-RD-A7C1-01A-11R-A32D-31 Cluster2
    ##  TCGA-BR-6852-01A-11R-1884-13 Cluster2
    ##  TCGA-BR-7851-01A-11R-2203-13 Cluster2
    ##  TCGA-BR-A4J1-01A-11R-A251-31 Cluster3
    
    library(RColorBrewer)
    library(pheatmap)
    
    mycol <- brewer.pal(7, "Set3")
    
    annColors <- list(results = c("Cluster1" = mycol[1],
                                  "Cluster2" = mycol[2],
                                  "Cluster3" = mycol[3], 
                                  "Cluster4" = mycol[4], 
                                  "Cluster5" = mycol[5], 
                                  "Cluster6" = mycol[6], 
                                  "Cluster7"= mycol[7]))
                        
    heatdata <- results[[7]][["consensusMatrix"]]
    dimnames(heatdata) <- list(colnames(intersection_geneExp), colnames(intersection_geneExp))
    heatdata[1:3,1:3]
    ##                              TCGA-BR-A4J4-01A-12R-A251-31 TCGA-BR-A4IZ-01A-32R-A251-31 TCGA-RD-A7C1-01A-11R-A32D-31
    ##  TCGA-BR-A4J4-01A-12R-A251-31                    1.0000000                    0.6542056                            0
    ##  TCGA-BR-A4IZ-01A-32R-A251-31                    0.6542056                    1.0000000                            0
    ##  TCGA-RD-A7C1-01A-11R-A32D-31                    0.0000000                    0.0000000                            1
    
    # 绘制热图
    result <- pheatmap(mat = heatdata,
             color = colorRampPalette((c("white","steelblue")))(100),
             border_color = NA,
             annotation_col = annCol,
             annotation_colors = annColors,
             show_colnames = F,
             show_rownames = F)
    
    ConsensusClusterPlus-7

    6. 提取分型结果

    library(tidyverse)
    
    Cluster_res <- annCol %>% as.data.frame %>% rownames_to_column("patient_ID") 
    table(Cluster_res$results) 
    ## 
    ## Cluster1 Cluster2 Cluster3 Cluster4 Cluster5 Cluster6 Cluster7 
    ##      132       94       37       57       38       48        6 
    
    #合并预后数据进行生存分析 
    STAD_clinical <- read_tsv("STAD_clinical.txt") 
    STAD_clinical$time <- ifelse(STAD_clinical$vital_status=='Alive', STAD_clinical$days_to_last_follow_up, STAD_clinical$days_to_death) # 如果患者已经死亡则选择days_to_death,如果患者处于活着状态,则选择days_to_last_follow_up
    STAD_clinical$status <- ifelse(STAD_clinical$vital_status=='Alive', 0, 1)
    
    Cluster_res <- Cluster_res %>% inner_join(STAD_clinical, "patient_ID") 
    head(Cluster_res) 
    ##                     patient_ID  results age gender tissue_or_organ_of_origin               primary_diagnosis vital_status ajcc_pathologic_stage year_of_birth days_to_last_follow_up year_of_death days_to_death time status
    ##  1 TCGA-BR-A4J4-01A-12R-A251-31 Cluster1  39   male            Gastric antrum             Adenocarcinoma, NOS        Alive            Stage IIIB          1973                     16            NA            NA   16      0
    ##  2 TCGA-BR-A4IZ-01A-32R-A251-31 Cluster1  45 female            Gastric antrum         Carcinoma, diffuse type         Dead            Stage IIIB          1967                     24            NA           273  273      1
    ##  3 TCGA-RD-A7C1-01A-11R-A32D-31 Cluster2  82   male         Fundus of stomach         Carcinoma, diffuse type         Dead              Stage IB          1923                     NA          2006           507  507      1
    ##  4 TCGA-BR-6852-01A-11R-1884-13 Cluster2  64 female               Cardia, NOS             Adenocarcinoma, NOS        Alive             Stage IIA          1947                   1367            NA            NA 1367      0
    ##  5 TCGA-BR-7851-01A-11R-2203-13 Cluster2  74   male           Body of stomach Adenocarcinoma, intestinal type         Dead             Stage IIB          1937                    574            NA            NA   NA      1
    ##  6 TCGA-BR-A4J1-01A-11R-A251-31 Cluster2  63   male            Gastric antrum             Adenocarcinoma, NOS         Dead            Stage IIIA          1949                     NA          2012            22   22      1
    
    #进行生存分析 
    library(survival)
    library(survminer)
    
    fit <- survfit(Surv(time, status) ~ results, data = Cluster_res) 
    
    ggsurvplot(fit, data = Cluster_res,  
               main = "Survival curve", # 添加标题
               surv.median.line = "hv", # 添加中位数生存时间线
               font.main = c(16, "bold", "darkblue"), # 设置标题字体大小、格式和颜色
               font.x = c(14, "bold.italic", "red"), # 设置x轴字体大小、格式和颜色
               font.y = c(14, "bold.italic", "darkred"), # 设置y轴字体大小、格式和颜色
               font.tickslab = c(12, "plain", "darkgreen"), # 设置坐标轴刻度字体大小、格式和颜色
               legend = c(0.9, 0.15), # 通过坐标指定图例位置
               legend.title = "Cluster", legend.labs = c("Cluster1","Cluster2","Cluster3","Cluster4","Cluster5","Cluster6","Cluster7"), # 更改图例标题和标签
               size = 1,  # 改变线条大小
               linetype = "strata", # 按组更改线型
               break.time.by = 250, # 将时间轴以250作为打断
               # palette = "Dark2" # 使用 brewer 调色板“Dark2”
               palette = c('#E5D2DD', '#53A85F', '#F1BB72', '#F3B1A0', '#D6E7A3', '#57C3F3', '#476D87'), # 自定义调色板
               ggtheme = theme_bw(), # 修改主题
               # conf.int = TRUE, # 添加置信区间
               pval = TRUE, # 添加p值
               risk.table = TRUE, risk.table.y.text.col = TRUE, # 添加风险表,并按层更改风险表y 文本颜色  
               tables.height = 0.2, # 设置风险表的高度
               tables.theme = theme_cleantable(), # 设置风险表的主题
               xlim = c(0, 1050) # 更改 x 轴范围
    )  
    
    ConsensusClusterPlus-8

    7. tSNE分析

    # 将数据框中字符型数据转化为数值型
    intersection_tumor <- as.data.frame(lapply(intersection_geneExp, as.numeric))
    row.names(intersection_tumor) <- row.names(intersection_geneExp)
    colnames(intersection_tumor) <- colnames(tumor_matrix)
    intersection_tumor = na.omit(intersection_tumor)
    intersection_tumor <- as.matrix(intersection_tumor)
    
    library(Rtsne)
    
    tSNE_res<- Rtsne(t(intersection_tumor), dims= 2, perplexity= 10, verbose= F, max_iter= 500, check_duplicates= F) 
    
    tsne<- data.frame(tSNE1 = tSNE_res[["Y"]][,1], tSNE2= tSNE_res[["Y"]][,2], cluster = annCol$results) 
    
    ggplot(tsne, aes(x= tSNE1, y= tSNE2, color= cluster)) + 
      geom_point(size= 4.5, alpha= 0.5) + 
      stat_ellipse(level= 0.85, show.legend = F) + 
      theme(legend.position= "top")
    
    ConsensusClusterPlus-9

    相关文章

      网友评论

        本文标题:R语言分析4:一致性聚类分析(ConsensusClusterP

        本文链接:https://www.haomeiwen.com/subject/mbgomdtx.html