这里以食管癌(Esophageal carcinoma,ESCA)为例
首先使用R包TCGAbiolinks下载ESCA的数据
library(TCGAbiolinks)
query.esca <- GDCquery(project = "TCGA-ESCA",
data.category = "Transcriptome Profiling",
data.type = "Gene Expression Quantification",
workflow.type = "STAR - Counts")
GDCdownload(query.esca)
上述代码运行完毕后,会在你的当前路径下创建一个GDCdata文件夹,然后并会自动连接TCGA网站进行数据的下载
# 合并所有样本
esca <- GDCprepare(query.esca)
# define ESCC and ESAD
table(esca$primary_diagnosis)
esca$tumor_type <- factor(esca$primary_diagnosis,
levels = c('Adenocarcinoma, NOS','Basaloid squamous cell carcinoma','Mucinous adenocarcinoma',
'Squamous cell carcinoma, keratinizing, NOS','Squamous cell carcinoma, NOS','Tubular adenocarcinoma'),
labels = c('ESAD','ESCC','ESAD','ESCC','ESCC','ESAD')) %>% as.character()
table(esca$tumor_type)
提取ESCC数据的TPM表达矩阵,并用TCGAbiolinks包自带的TCGAanalyze_survival()
函数进行生存分析
# 提取ESCC数据
escc <- esca[,which(esca$tumor_type == 'ESCC')]
# 提取TPM表达矩阵
tpm <- escc@assays@data$tpm_unstrand
dimnames(tpm) <- list(escc@rowRanges$gene_name,escc@colData@rownames)
# 按gender进行生存分析
TCGAanalyze_survival(esca@colData,clusterCol = 'gender')
# 按CREBBP基因表达高低进行生存分析
escc$CREBBP_exp <- 'CREBBP_high'
escc$CREBBP_exp[which(tpm['CREBBP',] < median(tpm['CREBBP',]))] <- 'CREBBP_low'
TCGAanalyze_survival(escc@colData,clusterCol = 'CREBBP_exp')
# SIRT7
escc$SIRT7_exp <- 'SIRT7_high'
escc$SIRT7_exp[which(tpm['SIRT7',] < median(tpm['SIRT7',]))] <- 'SIRT7_low'
TCGAanalyze_survival(escc@colData,clusterCol = 'SIRT7_exp')
参考
拓展阅读:https://www.jianshu.com/p/fd5e06ec260b
数据手动下载方法:https://zhuanlan.zhihu.com/p/563936447
网友评论