美文网首页
TCGAbiolinks下载TCGA数据,简单方便!

TCGAbiolinks下载TCGA数据,简单方便!

作者: 杏仁核 | 来源:发表于2020-02-24 18:12 被阅读0次
    install.packages("TCGAbiolinks")#安装
    library(TCGAbiolinks)#加载
    citation("TCGAbiolinks")#发文章引用文献
    packageVersion("TCGAbiolinks")#当前R包版本
    

    The data retrieval is handled by the three main TCGAbiolinks functions: GDCquery, GDCdownload and GDCprepare and allows the user to interface with three main platforms: i) TCGA, ii) TARGET and, iii) The Cancer Genome Characterization Initiative (CGCI) (https://ocg.cancer.gov/programs/cgci). TCGAbiolinks also allows the user to interface with different -omics data including genomics and transcriptomics, clinical and pathological data, information on drug treatments, and subtypes.*
    GDCprepare* allows the user to prepare the gene expression data for downstream analyses. This step is done by restructuring the data into a SummarizedExperiment (SE) object [39] that is easily manageable and integrable with other R/Bioconductor packages or just as a dataframe for other forms of data manipulation, which the user can operate even decoupled from the TCGAbiolinks package.

    query <- GDCquery(project = "TCGA-BRCA", #选定要下载的肿瘤类型
                      data.category = "Transcriptome Profiling",#选定要下载的数据范畴
                      data.type = "Gene Expression Quantification",#选定要下载的数据类型
                     workflow.type = "HTSeq - Counts",#选定要下载RNAseq的-counts文件
                      legacy <- FALSE)#使用的是hg38,TURE为hg19
    
    GDCdownload(query, method = "api", files.per.chunk = 2) #这一步是下载文件,method可以选api或者client,我喜欢用api,分段下载,就算中间失败了,还可以继续下载下去,因为网络太差了,文件就设置成2
    #directory:下载数据的文件夹。没加上去的话默认是:GDCdata
    
    截屏2020-02-24下午5.13.34.png

    query中也有包含很多有用的信息

    data<-GDCprepare(query)
    TCGA_counts <- assay(data)#得到数据框了,可以进行后续处理
    
    截屏2020-02-24下午6.07.54.png

    我家的wifi不知道什么问题,最近一直显示GDC serve down

    相关文章

      网友评论

          本文标题:TCGAbiolinks下载TCGA数据,简单方便!

          本文链接:https://www.haomeiwen.com/subject/yplbqhtx.html