变异数据处理(2)

作者: 白米饭睡不醒 | 来源:发表于2022-01-27 15:24 被阅读0次

signafiture

0.R包

rm(list=ls())
options(stringsAsFactors = F) 
project='TCGA_KIRC'
library(maftools) 
library(dplyr)
library(pheatmap)
library(ComplexHeatmap)
library(stringr)
library(deconstructSigs)
library(BSgenome)
library(BSgenome.Hsapiens.UCSC.hg38)

1.读取突变数据maf

是和上一篇一样的数据,从gdc下载的maf文件和临床信息

laml = read.maf(maf = "TCGA.KIRC.mutect.2a8f2c83-8b5e-4987-8dbf-01f7ee24dc26.DR-10.0.somatic.maf.gz")
#> -Reading
#> -Validating
#> -Silent variants: 8383 
#> -Summarizing
#> --Mutiple centers found
#> BI;BCM--Possible FLAGS among top ten genes:
#>   TTN
#>   MUC16
#>   HMCN1
#> -Processing clinical data
#> --Missing clinical data
#> -Finished in 3.345s elapsed (3.050s cpu)
laml 
#> An object of class  MAF 
#>                         ID summary   Mean Median
#>  1:             NCBI_Build  GRCh38     NA     NA
#>  2:                 Center  BI;BCM     NA     NA
#>  3:                Samples     336     NA     NA
#>  4:                 nGenes    9444     NA     NA
#>  5:        Frame_Shift_Del    1732  5.155      4
#>  6:        Frame_Shift_Ins    1201  3.574      1
#>  7:           In_Frame_Del     238  0.708      0
#>  8:           In_Frame_Ins     350  1.042      0
#>  9:      Missense_Mutation   12997 38.682     36
#> 10:      Nonsense_Mutation    1259  3.747      2
#> 11:       Nonstop_Mutation      18  0.054      0
#> 12:            Splice_Site     490  1.458      1
#> 13: Translation_Start_Site      25  0.074      0
#> 14:                  total   18310 54.494     47

mut=laml@data
mut[1:4,1:2]
#>    Hugo_Symbol Entrez_Gene_Id
#> 1:       CTPS1           1503
#> 2:      TRIM33          51592
#> 3:      NUP133          55746
#> 4:       GRHL1          29841
mut=mut[mut$Variant_Type=='SNP',]#只要单个位点的突变
#每行记录了一个突变,所以统计样本列的行数即为样本的突变数量
plot(table(mut[,16]),las = 2)
image.png

2.制作signatures的输入数据(96频谱)

关于什么是mutation

signature:http://www.bio-info-trainee.com/1619.html

关于96频谱:http://www.biotrainee.com/thread-832-1-1.html

a = mut[,c(16,5,6,12,13)]
colnames(a)=c( "Sample","chr", "pos","ref",  "alt")
a$Sample=as.character(a$Sample)

sigs.input <- mut.to.sigs.input(mut.ref = a, 
                                sample.id = "Sample", 
                                chr = "chr", 
                                pos = "pos", 
                                ref = "ref", 
                                alt = "alt",
                                bsg = BSgenome.Hsapiens.UCSC.hg38)
#> Warning in mut.to.sigs.input(mut.ref = a, sample.id = "Sample", chr = "chr", : Some samples have fewer than 50 mutations:
#>   TCGA-A3-3383-01A-01D-0966-08, TCGA-CJ-4908-01A-01D-1429-08, TCGA-CZ-5469-01A-01D-1501-10, TCGA-CZ-5986-01A-11D-1669-08, TCGA-A3-3365-01A-01D-0966-08, TCGA-CJ-4903-01A-01D-1429-08, TCGA-CJ-6032-01A-11D-1669-08, TCGA-B0-5116-01A-02D-1421-08, TCGA-G6-A8L7-01A-11D-A36X-10, TCGA-BP-5191-01A-01D-1429-08, TCGA-B8-5158-01A-01D-1421-08, TCGA-AK-3443-01A-01D-0966-08, TCGA-BP-4988-01A-01D-1462-08, TCGA-CZ-5984-01A-11D-1669-08, TCGA-DV-5566-01A-01D-1534-10, TCGA-BP-5202-01A-02D-1429-08, TCGA-DV-5568-01A-01D-1534-10, TCGA-CW-5589-01A-01D-1534-10, TCGA-B0-5400-01A-01D-1501-10, TCGA-BP-5184-01A-01D-1429-08, TCGA-BP-5181-01A-01D-1429-08, TCGA-DV-5575-01A-01D-1534-10, TCGA-BP-4965-01A-01D-1462-08, TCGA-BP-5009-01A-01D-1462-08, TCGA-B8-A54J-01A-11D-A33K-10, TCGA-BP-5187-01A-01D-1429-08, TCGA-CZ-5466-01A-01D-1501-10, TCGA-B8-5545-01A-01D-1669-08, TCGA-B8-5165-01A-01D-1421-08, TCGA-B8-A54K-01A-11D-A33K-10, TCGA-CW-5581-01A-02D-1534-10, TCGA-B0-5709-01A-11D-1534-10, TCGA-CW-6097-01A-11D-1669-08, TCGA-MM-A563-01A-11D-A25V-10, TCGA-T7-A92I-01A-11D-A36X-10, TCGA-B8-A8YJ-01A-13D-A38X-10, TCGA-CZ-5452-01A-01D-1501-10, TCGA-B8-4153-01B-11D-1669-08, TCGA-BP-5196-01A-01D-1429-08, TCGA-BP-5008-01A-01D-1462-08, TCGA-DV-5576-01A-01D-1534-10, TCGA-A3-3323-01A-01D-0966-08, TCGA-BP-5169-01A-01D-1429-08, TCGA-CZ-5470-01A-01D-1501-10, TCGA-BP-4968-01A-01D-1462-08, TCGA-B0-5081-01A-01D-1462-08, TCGA-BP-4801-01A-02D-1421-08, TCGA-DV-A4W0-01A-11D-A25V-10, TCGA-B8-A54I-01A-21D-A33K-10, TCGA-B2-5636-01A-02D-1534-10, TCGA-DV-5573-01A-01D-1534-10, TCGA-BP-5180-01A-01D-1429-08, TCGA-A3-3376-01A-02D-1421-08, TCGA-B0-5113-01A-01D-1421-08, TCGA-CJ-5676-01A-11D-1534-10, TCGA-B0-5110-01A-01D-1421-08, TCGA-BP-4991-01A-01D-1462-08, TCGA-BP-5186-01A-01D-1429-08, TCGA-CW-6087-01A-11D-1669-08, TCGA-B0-5711-01A-11D-1669-08, TCGA-B4-5832-01A-11D-1669-08, TCGA-CJ-4923-01A-01D-1429-08, TCGA-B4-5836-01A-11D-1669-08, TCGA-B2-4102-01A-02D-1386-10, TCGA-B0-5700-01A-11D-1534-10, TCGA-B0-5108-01A-01D-1421-08, TCGA-CJ-5671-01A-11D-1534-10, TCGA-BP-4967-01A-01D-1462-08, TCGA-B8-5549-01A-01D-1534-10, TCGA-CZ-5988-01A-11D-1669-08, TCGA-CJ-4904-01A-02D-1429-08, TCGA-CJ-4916-01A-01D-1429-08, TCGA-B0-4842-01A-02D-1421-08, TCGA-A3-3319-01A-01D-0966-08, TCGA-6D-AA2E-01A-11D-A36X-10, TCGA-CJ-4913-01A-01D-1429-08, TCGA-B8-A54G-01A-11D-A25V-10, TCGA-CJ-5675-01A-11D-1534-10, TCGA-B0-5693-01A-11D-1534-10, TCGA-CJ-4907-01A-01D-1429-08, TCGA-B8-4148-01A-02D-1386-10, TCGA-CW-5583-01A-02D-1534-10, TCGA-A3-3385-01A-02D-1421-08, TCGA-A3-A6NL-01A-11D-A33K-10, TCGA-AK-3440-01A-01D-0966-08, TCGA-BP-5175-01A-01D-1429-08, TCGA-BP-4975-01A-01D-1462-08, TCGA-BP-4987-01A-01D-1462-08, TCGA-B4-5834-01A-11D-1669-08, TCGA-AS-3778-01A-01D-0966-08, TCGA-B2-5633-01A-01D-A270-10, TCGA-DV-5565-01A-01D-1534-10, TCGA-DV-5569-01A-01D-1534-10, TCGA-B0-5084-01A-01D-1462-08, TCGA-B8-5159-01A-01D-1421-08, TCGA-CZ-5455-01A-01D-1501-10, TCGA-AK-3455-01A-01D-0966-08, TCGA-BP-5200-01A-01D-1429-08, TCGA-BP-5170-01A-01D-1429-08, TCGA-CJ-5678-01A-11D-1534-10, TCGA-EU-5907-01A-11D-1669-08, TCGA-A3-3380-01A-01D-0966-08, TCGA-BP-4998-01A-01D-1462-08, TCGA-CJ-6028-01A-11D-1669-08, TCGA-CJ-4899-01A-01D-1462-08, TCGA-CJ-4902-01A-01D-1429-08, TCGA-A3-A8OX-01A-11D-A36X-10, TCGA-B8-5553-01A-01D-1534-10, TCGA-BP-4177-01A-02D-1421-08, TCGA-EU-5905-01A-11D-1669-08, TCGA-CZ-5985-01A-11D-1669-08, TCGA-CW-5588-01A-01D-1534-10, TCGA-B0-5107-01A-01D-1421-08, TCGA-AK-3465-01A-01D-0966-08, TCGA-B0-5102-01A-01D-1421-08, TCGA-B0-5694-01A-11D-1534-10, TCGA-BP-4960-01A-01D-1462-08, TCGA-CZ-5982-01A-11D-1669-08, TCGA-A3-3374-01A-01D-0966-08, TCGA-CJ-4900-01A-01D-1462-08, TCGA-CZ-5987-01A-11D-1669-08, TCGA-B0-5696-01A-11D-1534-10, TCGA-B0-5083-01A-02D-1421-08, TCGA-BP-4981-01A-01D-1462-08, TCGA-B4-5844-01A-11D-1669-08, TCGA-BP-4961-01A-01D-1462-08, TCGA-B8-5162-01A-01D-1421-08, TCGA-B0-5104-01A-01D-1421-08, TCGA-BP-5201-01A-01D-1429-08, TCGA-B0-5120-01A-01D-1421-08, TCGA-B0-5402-01A-01D-1501-10, TCGA-B2-5635-01A-01D-A270-10, TCGA-B8-A7U6-01A-12D-A36X-10, TCGA-BP-5189-01A-02D-1429-08, TCGA-CZ-5460-01A-01D-1501-10, TCGA-B2-4101-01A-02D-1458-08, TCGA-B8-A54E-01A-11D-A25V-10, TCGA-AK-3453-01A-01D-0966-08, TCGA-BP-5010-01A-02D-1421-08, TCGA-DV-A4VZ-01A-11D-A25V-10, TCGA-B4-5843-01A-11D-1669-08, TCGA-B2-A4SR-01A-11D-A25V-10, TCGA-A3-A8CQ-01A-11D-A36X-10, TCGA-BP-5000-01A-01D-1462-08, TCGA-BP-4962-01A-01D-1462-08, TCGA-CZ-4863-01A-01D-1501-10, TCGA-BP-4982-01A-01D-1462-08, TCGA-B8-4151-01A-01D-1806-10, TCGA-CW-5591-01A-01D-1534-10, TCGA-BP-5006-01A-01D-1462-08, TCGA-A3-3317-01A-01D-0966-08, TCGA-B0-5077-01A-01D-1462-08, TCGA-CZ-5456-01A-01D-1501-10, TCGA-A3-3316-01A-01D-0966-08, TCGA-CZ-5989-01A-11D-1669-08, TCGA-CZ-5462-01A-01D-1501-10, TCGA-A3-3326-01A-01D-0966-08, TCGA-A3-3378-01A-01D-0966-08, TCGA-B0-4700-01A-02D-1534-10, TCGA-BP-5001-01A-01D-1462-08, TCGA-CJ-4905-01A-02D-1429-08, TCGA-BP-5004-01A-01D-1462-08, TCGA-A3-3373-01A-02D-1421-08, TCGA-B0-5691-01A-11D-1534-10, TCGA-A3-3367-01A-02D-1421-08, TCGA-B8-5163-01A-01D-1421-08, TCGA-BP-4971-01A-01D-1462-08, TCGA-BP-5178-01A-01D-1429-08, TCGA-AK-3427-01A-01D-0966-08, TCGA-BP-4992-01A-01D-1462-08, TCGA-BP-4986-01A-01D-1462-08, TCGA-BP-5194-01A-02D-1429-08, TCGA-BP-5192-01A-01D-1429-08, TCGA-BP-4970-01A-01D-1462-08, TCGA-CJ-6031-01A-11D-1669-08, TCGA-B0-5695-01A-11D-1534-10, TCGA-A3-3363-01A-01D-0966-08, TCGA-B8-5551-01A-01D-1534-10, TCGA-BP-4760-01A-02D-1421-08, TCGA-B0-4945-01A-01D-1421-08, TCGA-B0-5121-01A-02D-1421-08, TCGA-BP-4977-01A-01D-1462-08, TCGA-B0-5697-01A-11D-1534-10, TCGA-BP-5190-01A-01D-1429-08, TCGA-B0-5097-01A-01D-1421-08, TCGA-B8-A54F-01A-11D-A25V-10, TCGA-B0-5812-01A-11D-1669-08, TCGA-MW-A4EC-01A-11D-A25V-10, TCGA-B8-4146-01B-11D-1669-08, TCGA-B8-4622-01A-02D-1553-08, TCGA-B0-5707-01A-11D-1534-10, TCGA-B0-5699-01A-11D-1534-10, TCGA-CZ-5458-01A-01D-1501-10, TCGA-CJ-5683-01A-11D-1534-10, TCGA-BP-4972-01A-01D-1462-08, TCGA-CZ-5454-01A-01D-1501-10, TCGA-BP-4995-01A-01D-1462-08, TCGA-B0-5100-01A-01D-1421-08, TCGA-B8-A54D-01A-21D-A25V-10, TCGA-B2-5639-01A-01D-1534-10, TCGA-DV-5567-01A-01D-1534-10, TCGA-B0-5710-01A-11D-1669-08, TCGA-BP-4795-01A-02D-1421-08, TCGA-BP-4973-01A-01D-1462-08, TCGA-CW-5585-01A-01D-1534-10, TCGA-B8-5552-01B-11D-1669-08, TCGA-CJ-5684-01A-11D-1534-10, TCGA-A3-A8OW-01A-11D-A36X-10, TCGA-BP-5177-01A-01D-1429-08, TCGA-CZ-5467-01A-01D-1501-10, TCGA-B4-5377-01A-01D-1501-10, TCGA-A3-3322-01A-01D-0966-08, TCGA-AS-3777-01A-01D-0966-08, TCGA-A3-A6NI-01A-11D-A33K-10, TCGA-BP-4999-01A-01D-1462-08, TCGA-A3-3370-01A-02D-1421-08, TCGA-CZ-5463-01A-01D-1501-10, TCGA-B0-5109-01A-02D-1421-08, TCGA-EU-5906-01A-11D-1669-08, TCGA-BP-5007-01A-01D-1462-08, TCGA-B0-5399-01A-01D-1501-10, TCGA-B0-5692-01A-11D-1534-10, TCGA-CJ-4901-01A-01D-1429-08, TCGA-A3-3311-01A-01D-0966-08, TCGA-A3-A6NJ-01A-12D-A33K-10, TCGA-G6-A8L6-01A-11D-A36X-10, TCGA-G6-A5PC-01A-11D-A33K-10, TCGA-B8-5546-01A-01D-1534-10, TCGA-BP-5183-01A-01D-1429-08, TCGA-CJ-5680-01A-11D-1534-10, TCGA-DV-5574-01A-01D-1534-10, TCGA-EU-5904-01A-11D-1669-08, TCGA-B0-5706-01A-11D-1534-10, TCGA-BP-4974-01A-01D-1462-08, TCGA-B0-5115-01A-01D-1421-08, TCGA-CW-5587-01A-01D-1534-10, TCGA-BP-5174-01A-01D-1429-08, TCGA-CJ-5681-01A-11D-1534-10, TCGA-BP-4993-01A-02D-1421-08, TCGA-CJ-5686-01A-11D-1669-08, TCGA-B0-5117-01A-01D-1421-08
class(sigs.input)
#> [1] "data.frame"
#第一个样本的突变绘图
barplot(as.numeric(sigs.input[1,])~colnames(sigs.input))
image.png
sigs.input[1:4,1:4]
#>                              A[C>A]A A[C>A]C A[C>A]G A[C>A]T
#> TCGA-A3-3383-01A-01D-0966-08       2       0       0       0
#> TCGA-CJ-4920-01A-01D-1429-08       0       1       0       2
#> TCGA-CJ-4908-01A-01D-1429-08       0       2       0       0
#> TCGA-CZ-5469-01A-01D-1501-10       2       0       1       0

一顿操作猛如虎,生成signature与样本对应关系的矩阵

if(!file.exists(paste0(project,"w.Rdata"))){
  w=lapply(unique(a$Sample), function(i){
    ## signatures.cosmic signatures.nature2013
    sample_1 = whichSignatures(tumor.ref = sigs.input, 
                               signatures.ref = signatures.cosmic, 
                               sample.id =  i, 
                               contexts.needed = TRUE,
                               tri.counts.method = 'default')
    print(c(i,which(unique(a$Sample)==i)))
    return(sample_1$weights)
  })
  w=do.call(rbind,w)
  save(w,file = paste0(project,"w.Rdata"))
}
load(paste0(project,"w.Rdata"))
mat = t(w)

3.可视化

矩阵的可视化–热图

3.1 简单版本的热图

pheatmap::pheatmap(mat,
         cluster_rows = F,
         cluster_cols = T,
         show_colnames = F)
image.png

3.2 好看一点的热图

看到了这个图,氮素不知道用什么包画的,反正闲的没事干,不如自己模仿一波。如果你知道用的什么包,可以告诉我~感谢

image.png

热图上面放直方图,直方图内容是每个样本的突变数量。想到了ComplexHeatmap,可以将直方图作为注释添加上去。

难点是调整顺序,R语言神技能加持。

捋一捋:

直方图横纵坐标是样本和突变数量 热图输入数据mat横纵坐标是样本和signature 注释栏来源于clinical数据,与样本一一对应。 所以需要调整样本顺序,三者统一。 注释栏简化一下,用stage、生死和性别;先排一下序,再按照排好的顺序给mat和直方图输入数据排序。

load("KIRC_pd.Rdata")
head(pd)
#>             Tumor_Sample_Barcode stage_event gender vital_status
#> 482 TCGA-A3-3383-01A-01D-0966-08           I   MALE        Alive
#> 438 TCGA-CJ-4920-01A-01D-1429-08           I FEMALE         Dead
#> 142 TCGA-CJ-4908-01A-01D-1429-08           I   MALE        Alive
#> 311 TCGA-CZ-5469-01A-01D-1501-10          II   MALE         Dead
#> 234 TCGA-CZ-5986-01A-11D-1669-08           I   MALE        Alive
#> 202 TCGA-A3-3365-01A-01D-0966-08           I   MALE        Alive
#调整pd的行顺序
pd = arrange(pd,stage_event,gender,vital_status)
#mat与pd一一对应
mat = mat[,match(pd$Tumor_Sample_Barcode,colnames(mat))]
#只取一部分signiture来画
s = head(names(sort(rowSums(mat),decreasing = T)),8)
mat = mat[s,]

#顶部直方图
for_bar = table(mut$Tumor_Sample_Barcode)
for_bar = for_bar[match(colnames(mat),names(for_bar))]
for_bar = as.integer(for_bar)

annotation = HeatmapAnnotation(
  mut = anno_barplot(for_bar),
  df = pd[,-1])

ht_list = Heatmap(mat, name = "mat", 
        cluster_rows = F,
        cluster_columns = F,
        show_column_names = F,
        column_split = pd$stage_event,
        top_annotation = annotation)

draw(ht_list, heatmap_legend_side = "left", annotation_legend_side = "bottom")
image.png

*生信技能树课程笔记

相关文章

  • 毕业论文

    数据处理 BSA的变异,分析,找出不利变异,分析变异结构。 可变剪切的分析,参考https://www.jians...

  • 变异数据处理(2)

    signafiture 0.R包 1.读取突变数据maf 是和上一篇一样的数据,从gdc下载的maf文件和临床信息...

  • 变异数据处理(1)

    maftools 变异数据处理的总体思路如下: 思维导图 1.数据下载 TCGA的突变数据有4个软件得到的不同版本...

  • 变异数据处理(4)

    corralation 0.输入数据和R包 2.任意两个基因的相关性分析 2.1 简单绘图 使用ggpbur。 2...

  • 变异数据处理(3)

    任意基因的任意分组比较 0.输入数据 这里面的数据: exp是tumor-normal都有的表达矩阵,exprSe...

  • GISTIC2.0初使用

    我的操作系统:Ubuntu 16.04 LTS GISTIC2.0用以分析癌症拷贝数变异的一个数据处理模块。具体介...

  • 胚系变异(SNP/InDel)频率波动的影响因素

    背景   在二代数据处理中,用GATK HaplotypeCaller去call二倍体胚系变异应该属于比较常用的检...

  • 读达尔文的《物种起源》

    (2) 简述自然状态下的变异 家养状况下的生物变异,应用到自然界的生物状态中是否存在着变异? ​变异性--个体差...

  • 数据处理神器tidyverse(2)ggplot2

    数据处理神器tidyverse(1)dplyr 数据处理神器tidyverse(2)ggplot2 这样输出的是空...

  • 变异数据

    变异数据处理的总体思路如下: 1.数据下载 TCGA的突变数据有4个软件得到的不同版本: 这个可以在gdc的官网上...

网友评论

    本文标题:变异数据处理(2)

    本文链接:https://www.haomeiwen.com/subject/riyghrtx.html