论文
GDCRNATools: an R/Bioconductor package for integrative analysis of lncRNA, miRNA and mRNA data in GDC
Department of Botany and Plant Sciences, University of California, Riverside
Bioinformatics
GDC: The Genomic Data Commons
基本功能
- 数据下载
- ceRNA网络分析
- 差异表达分析
- 功能富集分析
- 生存分析
- 数据可视化
火山图、热图、GO富集分析结果、KEGG富集分析结果等
接下来重复帮助文档中的例子
帮助文档链接 http://bioconductor.org/packages/devel/bioc/vignettes/GDCRNATools/inst/doc/GDCRNATools.html
library(GDCRNATools)
project<-'TCGA-CHOL'
rnadir<-paste(project,'RNAseq',sep='/')
mirdir<-paste(project,'miRNAs',sep="/")
gdcRNADownload(project.id = 'TCGA-CHOL',
data.type = 'RNAseq',
write.manifest = F,
method = 'gdc-client',
directory = rnadir)
在linux系统中重复到这一步的时候遇到报错
ImportError: /lib64/libc.so.6: version `GLIBC_2.18' not found (required by /tmp/_MEIylVP0W/libstdc++
我的解决办法是把它默认下载的gdc-client_v1.3.0替换掉,我换成gdc-client_v1.5.0,下载地址是https://gdc.cancer.gov/access-data/gdc-data-transfer-tool
gdcRNADownload(project.id = 'TCGA-CHOL',
data.type = 'miRNAs',
write.manifest = F,
method = 'gdc-client',
directory = mirdir)
clinicaldir<-paste(project,'Clinical',sep='/')
gdcClinicalDownload(project.id = 'TCGA-CHOL',
write.manifest = F,
method='gdc-client',
directory = clinicaldir)
metaMatrix.RNA<-gdcParseMetadata(project.id = 'TCGA-CHOL',
data.type = 'RNAseq',
write.meta = F)
metaMatrix.RNA<-gdcFilterDuplicate(metaMatrix.RNA)
metaMatrix.RNA<-gdcFilterSampleType(metaMatrix.RNA)
metaMatrix.MIR<-gdcParseMetadata(project.id = 'TCGA-CHOL',
data.type = 'miRNAs',
write.meta = F)
metaMatrix.MIR
metaMatrix.MIR<-gdcFilterDuplicate(metaMatrix.MIR)
metaMatrix.MIR<-gdcFilterSampleType(metaMatrix.MIR)
获取表达矩阵
rnaCounts<-gdcRNAMerge(metadata = metaMatrix.RNA,
path = rnadir,
organized = FALSE,
data.type = 'RNAseq')
mirCounts<-gdcRNAMerge(metadata = metaMatrix.MIR,
path = mirdir,
organized = FALSE,
rnaCounts[1:5,1:5]
mirCounts[1:5,1:5]
标准化表达数据
rnaExpr<-gdcVoomNormalization(counts=rnaCounts,filter=F)
mirExpr<-gdcVoomNormalization(counts=mirCounts,filter=F)
rnaExpr[1:5,1:5]
mirExpr[1:5,1:5]
差异表达分析
DEGAll<-gdcDEAnalysis(counts = rnaCounts,
group=metaMatrix.RNA$sample_type,
comparison = 'PrimaryTumor-SolidTissueNormal',
method='limma')
deALL<-gdcDEReport(deg=DEGAll,gene.type = 'all')
deLNC<-gdcDEReport(deg=DEGAll,gene.type='long_non_coding')
dePC<-gdcDEReport(deg=DEGAll,gene.type = 'protein_coding')
记下来是数据可视化展示
柱形图展示差异表达的基因类型
gdcBarPlot(deg=deALL,angle = 45,data.type = 'RNAseq')
image.png
这里TEC和IG分别是啥?
长链非编码RNA的差异表达火山图
gdcVolcanoPlot(deLNC)
热图
degName<-rownames(deLNC)
gdcHeatmap(deg.id = degName,metadata = metaMatrix.RNA,rna.expr = rnaExpr)
image.png
富集分析
enrichOutput<-gdcEnrichAnalysis(gene=rownames(deALL),
simplify=T)
gdcEnrichPlot(enrichOutput,type='bar',category = 'GO',num.terms = 10)
画图的时候遇到报错
Error in .Call.graphics(C_palette2, .Call(C_palette2, NULL)) :
invalid graphics state
不知道原因出在哪里,但是保存到本地没问题
pdf(file="../goenrich.pdf",width = 15,height = 15)
gdcEnrichPlot(enrichOutput,type='bar',category = 'GO',num.terms = 10)
dev.off()
image.png
ceRNA网络
ceOUtput<-gdcCEAnalysis(lnc=rownames(deLNC),
pc=rownames(dePC),
lnc.targets = 'starBase',
pc.targets = 'starBase',
rna.expr = rnaExpr,
mir.expr = mirExpr)
edges<-gdcExportNetwork(ceNetwork = ceOutput2,net='edges')
nodes<-gdcExportNetwork(ceNetwork = ceOutput2,net='nodes')
write.table(edges,file='edges.txt',sep='\t',quote=F)
write.table(nodes,file="nodes.txt",sep="\t",quote=F)
最后生成了两个文件,如何用cytoscape可视化这两个文件我暂时还不知道如何实现。
今天就先到这里了。
欢迎大家关注我的公众号
小明的数据分析笔记本
中国加油!武汉加油!
网友评论