美文网首页基因突变
TCGA 同源重组缺陷(HRD)

TCGA 同源重组缺陷(HRD)

作者: 小洁忘了怎么分身 | 来源:发表于2023-01-20 02:50 被阅读0次

    0.背景知识

    最近在研读新的文献O(∩_∩)O。

    Genomic, epigenomic, and transcriptomic signatures for telomerase complex components: a pan‐cancer analysis

    其中提到:

    Genomic instability included aneuploidy, somatic total mutation burden (TMB), somatic copy number alterations (SCNA), loss of heterozygosity (LOH), and homologous recombination deficiency (HRD).

    基因组不稳定性包括非整倍性、体细胞总突变负荷(TMB)、体细胞拷贝数改变(SCNA)、杂合性丧失(LOH)和同源重组缺陷(HRD)。

    同源重组修复(homologous recombination repair,HRR)是DNA双链断裂(double strand break,DSB)的首选修复方式。

    同源重组修复缺陷(homologous recombination deficiency,HRD)通常指细胞水平上的HRR功能障碍状态,可由HRR相关基因胚系突变或体细胞突变以及表观遗传失活等诸多因素导致,常存在于多种恶性肿瘤中,其中在卵巢癌、乳腺癌、胰腺导管癌、前列腺癌等肿瘤中尤其突出。

    杂合性缺失(loss of heterozygosity,LOH):大于15 Mb且小于整个染色体长度的杂合性缺失;

    端粒等位基因不平衡(telomeric allelic imbalance,TAI):延伸到其中一个亚端粒但不超过着丝粒且大于11 Mb的等位基因不平衡的染色体片段;

    大片段迁移(large-scale state transition,LST):两个相邻区域(两个区域长度均大于等于10 Mb,且区域间距小于3 Mb)之间的染色体断裂位点,肿瘤基因组截断点的总数可以用来描述基因组的不稳定性。

    中国抗癌协会肿瘤标志专业委员会遗传性肿瘤标志物协作组, 中华医学会病理学分会分子病理学组 . 同源重组修复缺陷临床检测与应用专家共识(2021版)[J]. 中国癌症防治杂志, 2021, 13(4): 329-338.

    经过查找搜寻,找到了HRD的数据计算结果。在:

    https://gdc.cancer.gov/about-data/publications/PanCan-DDR-2018

    DNA damage repair (DDR)

    给出了一组tsv,和一个大的xls文件,这是关于DNA损伤修复所有的资料,非常之齐全。

    dir("TCGA_DDR_Data_Resources/")
    
    ##  [1] "DDRscores.tsv"          "GeneAlterations.tsv"    "GeneDeletions.tsv"     
    ##  [4] "GeneMutations.tsv"      "Genes.tsv"              "GeneSilencing.tsv"     
    ##  [7] "PathwayAlterations.tsv" "PathwayDeletions.tsv"   "PathwayMembership.tsv" 
    ## [10] "PathwayMutations.tsv"   "Pathways.tsv"           "PathwaySilencing.tsv"  
    ## [13] "Samples.tsv"            "Scores.tsv"
    

    对这些文件的描述在pdf文件里,可以在上面的网址里下载到。

    DDRscores.tsv

    Data table listing the scores of the 43 DDR footprints and the RRPA-based DDR score across all samples (n=9,125).

    The order of the genes, pathways, samples and footprint scores in these TSV files are as given in Genes.tsv, Pathways.tsv, Samples.tsv and Scores.tsv

    1.获得DDRscores表格

    sc = read.delim("TCGA_DDR_Data_Resources/DDRscores.tsv",header = F)
    s = read.delim("TCGA_DDR_Data_Resources/Samples.tsv",header = F)
    cl = read.delim("TCGA_DDR_Data_Resources/Scores.tsv",header = F)
    rownames(sc) = s$V1
    colnames(sc) = cl$V1
    colnames(sc)
    
    ##  [1] "mutLoad_silent"         "mutLoad_nonsilent"      "mutSig1"               
    ##  [4] "mutSig2"                "mutSig3"                "mutSig4"               
    ##  [7] "mutSig5"                "mutSig6"                "mutSig7"               
    ## [10] "mutSig8"                "mutSig9"                "mutSig10"              
    ## [13] "mutSig11"               "mutSig12"               "mutSig13"              
    ## [16] "mutSig14"               "mutSig15"               "mutSig16"              
    ## [19] "mutSig17"               "mutSig18"               "mutSig19"              
    ## [22] "mutSig20"               "mutSig21"               "CNA_n_segs"            
    ## [25] "CNA_frac_altered "      "CNA_n_focal_amp_del"    "aneuploidy_score"      
    ## [28] "aneuploidy_score_prime" "LOH_n_seg"              "LOH_frac_altered "     
    ## [31] "purity"                 "ploidy"                 "genome_doublings"      
    ## [34] "subclonal_frac"         "HRD_TAI"                "HRD_LST"               
    ## [37] "HRD_LOH"                "HRD_Score"              "eCARD"                 
    ## [40] "PARPi7"                 "PARPi7_bin"             "RPS"                   
    ## [43] "tp53_score"             "rppa_ddr_score"
    
    # HDR分数
    scores = sc[,35:38]
    head(scores)
    
    ##                 HRD_TAI HRD_LST HRD_LOH HRD_Score
    ## TCGA-OR-A5J1-01       3       2       2         7
    ## TCGA-OR-A5J2-01       4       2       3         9
    ## TCGA-OR-A5J3-01       0       0       0         0
    ## TCGA-OR-A5J5-01       2       2       4         8
    ## TCGA-OR-A5J6-01       3       1       1         5
    ## TCGA-OR-A5J7-01      10       8       3        21
    
    nrow(scores)
    
    ## [1] 9125
    

    2. 加上癌症类型画个图看看

    head(s)
    
    ##                V1  V2
    ## 1 TCGA-OR-A5J1-01 ACC
    ## 2 TCGA-OR-A5J2-01 ACC
    ## 3 TCGA-OR-A5J3-01 ACC
    ## 4 TCGA-OR-A5J5-01 ACC
    ## 5 TCGA-OR-A5J6-01 ACC
    ## 6 TCGA-OR-A5J7-01 ACC
    
    identical(s$V1,rownames(scores))
    
    ## [1] TRUE
    
    scores = cbind(s,scores)
    colnames(scores)[1:2] = c("Id","Project")
    library(tidyverse)
    dat = drop_na(scores,HRD_Score)
    su = group_by(dat,Project) %>% 
      summarise(a = median(HRD_Score)) %>% 
      arrange(desc(a))
    dat$Project = factor(dat$Project,levels = su$Project)
    library(ggplot2)
    library(RColorBrewer)
    mypalette <- colorRampPalette(brewer.pal(8,"Set1"))
    ggplot(dat,aes(x = Project,y = HRD_Score,fill = Project))+
      geom_boxplot()+
      theme_bw()+
      theme(axis.text.x = element_text(vjust = 1,hjust = 1,angle = 45),legend.position = "bottom")+
      scale_fill_manual(values = mypalette(33))+
      guides (fill=guide_legend (nrow=3, byrow=TRUE))
    
    # ggplot legend number of rows
    

    这个数据本身就是全部癌症样本啦。可以看到HRD分数最高的是卵巢癌。

    3. 对表格列名的注释

    xs = rio::import_list("TCGA_DDR_Data_Resources.xlsx")
    names(xs)
    
    ##  [1] "DDR genes and pathways"          "DDR gene alterations ONCOPRINT" 
    ##  [3] "DDR gene mutations"              "DDR deep deletions"             
    ##  [5] "DDR epigenetic silencing"        "DDR gene alterations"           
    ##  [7] "DDR footprint summary"           "DDR footprints"                 
    ##  [9] "DDR pathway alterations ONCOPR " "DDR pathway mutations "         
    ## [11] "DDR pathway deletions"           "DDR pathway silencing"          
    ## [13] "DDR pathway alterations"         "ME CO analysis core pathways"   
    ## [15] "ME CO analysis inclusive pathw " "DDR Survival Univariate"        
    ## [17] "DDR Survival Multivariate"       "DDR gene fusions"               
    ## [19] "TP53 predictor"
    
    xs[[7]][14:17,1]
    ## [1] "TAI"       "LST"       "HRD LOH"   "HRD Score"
    xs[[7]][14:17,2]
    ## [1] "number of subchromosomal regions with allelic imbalance extending to the telomere"           
    ## [2] "number of chromosomal breaks between adjacent regions of at least 10Mb"                      
    ## [3] "the number of LOH regions of intermediate size (> 15MB but < whole chromosome in length)"    
    ## [4] "Homologous recombination deficiency score calculated from three scores (TAI + LST + HRD LOH)"
    

    相关文章

      网友评论

        本文标题:TCGA 同源重组缺陷(HRD)

        本文链接:https://www.haomeiwen.com/subject/sztchdtx.html