install.packages("TCGAbiolinks")#安装
library(TCGAbiolinks)#加载
citation("TCGAbiolinks")#发文章引用文献
packageVersion("TCGAbiolinks")#当前R包版本
The data retrieval is handled by the three main TCGAbiolinks functions: GDCquery, GDCdownload and GDCprepare and allows the user to interface with three main platforms: i) TCGA, ii) TARGET and, iii) The Cancer Genome Characterization Initiative (CGCI) (https://ocg.cancer.gov/programs/cgci). TCGAbiolinks also allows the user to interface with different -omics data including genomics and transcriptomics, clinical and pathological data, information on drug treatments, and subtypes.*
GDCprepare* allows the user to prepare the gene expression data for downstream analyses. This step is done by restructuring the data into a SummarizedExperiment (SE) object [39] that is easily manageable and integrable with other R/Bioconductor packages or just as a dataframe for other forms of data manipulation, which the user can operate even decoupled from the TCGAbiolinks package.
query <- GDCquery(project = "TCGA-BRCA", #选定要下载的肿瘤类型
data.category = "Transcriptome Profiling",#选定要下载的数据范畴
data.type = "Gene Expression Quantification",#选定要下载的数据类型
workflow.type = "HTSeq - Counts",#选定要下载RNAseq的-counts文件
legacy <- FALSE)#使用的是hg38,TURE为hg19
GDCdownload(query, method = "api", files.per.chunk = 2) #这一步是下载文件,method可以选api或者client,我喜欢用api,分段下载,就算中间失败了,还可以继续下载下去,因为网络太差了,文件就设置成2
#directory:下载数据的文件夹。没加上去的话默认是:GDCdata


query中也有包含很多有用的信息
data<-GDCprepare(query)
TCGA_counts <- assay(data)#得到数据框了,可以进行后续处理

我家的wifi不知道什么问题,最近一直显示GDC serve down
网友评论