美文网首页good code rna_seq走进转录组
RNA-seq下游分析(3)-rawcount转换为TPM及RP

RNA-seq下游分析(3)-rawcount转换为TPM及RP

作者: 灵活胖子的进步之路 | 来源:发表于2020-12-11 16:22 被阅读0次

    因为我用的GTF文件较新,为realease101版本,因此需要自己生成转换文件需要的TxDb文件,首先需要安装GenomicFeatures包,然后直接在线应用命令进行数据库建立TxDb文件,因为我的表达矩阵的行名为ensemble,因此我直接从官网得到数据库文件

    library(GenomicFeatures)#载入包
    txdb <- makeTxDbFromEnsembl(organism="Homo sapiens",#设定种族
                        release=101,设定版本
                        server="ensembldb.ensembl.org")#审定服务器地址
    saveDb(txdb, file='txdbensemble101.sqlite')#保存生存的TXDB文件
    

    后续转换的时候可以直接应用Github上别人写的包,我直接下载。下载的地址为https://codeload.github.com/mapawlak/normalizeGeneCounts/zip/master
    下载下来后解压,本地直接source加载后就可以了

    TxDb <- loadDb(file='txdbensemble101.sqlite')#载入参考数据集
    counts <- read.table("all.id.txt",#读入原始counts文件
                           header = T,#第一行为列名
                           row.names = 1,#第一列为行名
                           check.names = F)#保持原文名称
    RPKM <- normalizeGeneCounts(counts, TxDb, method = "RPKM")
    source("normalizeGeneCounts.R")#载入本地脚本
    
    #以下分别生成校准后文件
    RPKM <- normalizeGeneCounts(counts, TxDb, method = "RPKM")
    
    TPM <- normalizeGeneCounts(counts, TxDb, method = "TPM")
    
    CPM <- normalizeGeneCounts(counts, TxDb, method = "CPM")
    
    save(CPM,RPKM,TPM,file = "normal.Rdata")#保持到本地
    

    以下命令是normalizeGeneCounts作者的说明文件中的命令

    # normalizeGeneCounts
    
    ## Description
    Function takes raw counts of NGS data and normalize to [CPM, RPKM or TPM](https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/).
    
    ## Usage
    normalizeGeneCounts(counts, TxDb, method)
    
    ## Arguments
    
    *counts* A data frame with gene ID in rownames and sample ID in colnames
    
    *TxDb* `GenomicFeatures` object created from the same gtf as counts object
    
    *method* Should be "CPM", "RPKM" or "TPM"
    
    ## Example
    
    CPM <- normalizeCounts(counts, TxDb, method = "CPM")
    
    

    以下命令是作者R文件的原始代码

    normalizeGeneCounts <- function(counts, TxDb, method) {
      require("GenomicFeatures")
      length <- width(genes(TxDb))/10 ^ 3 #gene length per kb
      if (method == "CPM") {
        normalized <-
          as.data.frame(apply(counts,2, function(x) (x/sum(x)) * 10 ^ 6))
      } else if (method == "RPKM") {
        normalized <-
          as.data.frame(t(apply(counts, 1, "/", colSums(counts) / 10 ^ 6)) / length)
      } else if (method == "TPM") {
        normalized <-
          as.data.frame(t(apply(
            counts / length, 1, "/", colSums(counts / length) / 10 ^ 6
            )))
      } else {
        stop("method must be CPM, RPKM or TPM")
      }
        return(normalized)
    }
    

    相关文章

      网友评论

        本文标题:RNA-seq下游分析(3)-rawcount转换为TPM及RP

        本文链接:https://www.haomeiwen.com/subject/auwjgktx.html