下载并预处理TCGA数据

作者: 落寞的橙子 | 来源:发表于2019-05-19 23:55 被阅读2次

    本文为TCGA数据的下载,并整理为行名为基因名的数据结构

    #数据下载的网站,下载下来并命名为HNSC_RSEM_genes_normalized.txt
    #http://gdac.broadinstitute.org/runs/stddata__2016_01_28/data/HNSC/20160128/gdac.broadinstitute.org_HNSC.Merge_rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data.Level_3.2016012800.0.0.tar.gz.md5
    library(stringr)
    hnsc<-read.table("your_dir/HNSC_RSEM_genes_normalized.txt",header = T,check.names = F,sep="\t")
    hnsc<-hnsc[-1,]
    row_name<-as.character(hnsc[,1])
    row_name<-unlist(lapply(row_name, FUN = function(x) {return(strsplit(x, split = "|",fixed = T)[[1]][1])}))
    hnsc[,1]<-row_name
    hnsc<-hnsc[!duplicated(hnsc[,1]),]
    row.names(hnsc)<-as.character(hnsc[,1])
    hnsc<-hnsc[,-1]
    col_names<-colnames(hnsc)
    new_names<-unlist(lapply(col_names, FUN = function(x) {return(substr(x,1,16))}))
    colnames(hnsc)<-new_names
    write.csv(hnsc,"your_dir/hnsc_clean_data.csv")
    

    相关文章

      网友评论

        本文标题:下载并预处理TCGA数据

        本文链接:https://www.haomeiwen.com/subject/pdwczqtx.html