生信笔记15-TCGA数据下载及生存分析

作者: 江湾青年 | 来源:发表于2023-03-28 10:18 被阅读0次

六、生存分析
使用TCGAbiolinks进行生存分析
DAY3-跟着另一篇文章做生存分析
解读GEO数据存放规律及下载，一文就够-转自生信技能树
单基因生信分析流程（5）计算单基因相关miRNA
单基因生信分析流程（2）一文解决差异分析、基因相关分析问题
单基因生信分析流程（3）一文解决生存分析和临床参数相关分析
从TCGA下载的临床数据处理
Day-1 xiaode
使用coxph 回归和logrank test对TCGA数据做批

这里以食管癌（Esophageal carcinoma，ESCA）为例

首先使用R包TCGAbiolinks下载ESCA的数据

library(TCGAbiolinks)
query.esca <- GDCquery(project = "TCGA-ESCA", 
                       data.category = "Transcriptome Profiling", 
                       data.type = "Gene Expression Quantification", 
                       workflow.type = "STAR - Counts")
GDCdownload(query.esca)

上述代码运行完毕后，会在你的当前路径下创建一个GDCdata文件夹，然后并会自动连接TCGA网站进行数据的下载

# 合并所有样本
esca <- GDCprepare(query.esca)
# define ESCC and ESAD
table(esca$primary_diagnosis)
esca$tumor_type <- factor(esca$primary_diagnosis,
                          levels = c('Adenocarcinoma, NOS','Basaloid squamous cell carcinoma','Mucinous adenocarcinoma',
                          'Squamous cell carcinoma, keratinizing, NOS','Squamous cell carcinoma, NOS','Tubular adenocarcinoma'),
                          labels = c('ESAD','ESCC','ESAD','ESCC','ESCC','ESAD')) %>% as.character()
table(esca$tumor_type)

提取ESCC数据的TPM表达矩阵，并用TCGAbiolinks包自带的TCGAanalyze_survival()函数进行生存分析

# 提取ESCC数据
escc <- esca[,which(esca$tumor_type == 'ESCC')]
# 提取TPM表达矩阵
tpm <- escc@assays@data$tpm_unstrand
dimnames(tpm) <- list(escc@rowRanges$gene_name,escc@colData@rownames)
# 按gender进行生存分析
TCGAanalyze_survival(esca@colData,clusterCol = 'gender')
# 按CREBBP基因表达高低进行生存分析
escc$CREBBP_exp <- 'CREBBP_high'
escc$CREBBP_exp[which(tpm['CREBBP',] < median(tpm['CREBBP',]))] <- 'CREBBP_low'
TCGAanalyze_survival(escc@colData,clusterCol = 'CREBBP_exp')
# SIRT7
escc$SIRT7_exp <- 'SIRT7_high'
escc$SIRT7_exp[which(tpm['SIRT7',] < median(tpm['SIRT7',]))] <- 'SIRT7_low'
TCGAanalyze_survival(escc@colData,clusterCol = 'SIRT7_exp')