DESeq2跑差异基因

作者: Aji | 来源:发表于2021-03-05 16:49 被阅读0次

    20210305 16:36
    最近在处理single cell 数据,老板要求用DESeq2跑差异基因,Seurat中自带的DESeq2跑出来的有很多是0,很奇怪,也不懂是啥原因,没有往下面细细研究,后来师姐用DESeq2直接跑,就是把每一个细胞当成一个样本来跑的,细胞数目少的时候跑的时间用的少,但是细胞数目多的时候,时间就会用的多。之前都是换一组数据,就得写个代码,很麻烦。今天给它包装了下,跑起来方便多了,很开心。最近的粉丝涨了三个,开心开心,虽然离100的粉丝量还差好多,但是是在进步的就好。

    1.函数
    #!/~/bin/Rscript
    ## 第一行是Linux上Rscript上所在位置
    
    # Function: This is a pipeline to find DEGs using DESeq2
    # Input: 1.rds odject which includes counts variable and label variable; 2. outputname to save the DEGs
    # Output: csv of DESeq2 DEGs
    # Version: 2021-03-05, Yiyi Zheng
    
    library(optparse)
    library(getopt)
    option_list <- list(
      make_option(c("-o", "--output"), type = "character", default = FALSE,
                  action = "store", help = "This is the output directory."
      ),
      make_option(c("--rds_name"), type = "character", default = FALSE,
                  action = "store", help = "This is the name of rds to input."
      ),
      make_option(c("--output_name"), type = "character", default = FALSE,
                  action = "store", help = "This is the name of ouput file."
      )
    )
    
    # -help 
    opt = parse_args(OptionParser(option_list = option_list, 
                                  usage = "This is a script to find DEGs using DESeq2"))
    print(opt)
    
    # set the output path
    library(DESeq2)
    DESeq2_Object <- readRDS(opt$rds_name)
    
    setwd(opt$output)
    timestart<-Sys.time() 
    print("The time begining:")
    print(timestart)
    
    ##  Main function 
    print("The level and number of input data:")
    # The levels of condition need to assign by yourself. DESeq2 will use the first level as the condtion.
    #condition<-factor(c(rep("Control",7104), rep("Obesity",6965)), levels = c("Control","Obesity"))
    
    condition <- factor(DESeq2_Object$label)
    print(levels(condition))
    print(table(condition))
    
    print("The contrl is")
    print(table(condition)[1])
    print("The treat is")
    print(table(condition)[2])
    
    count <- DESeq2_Object$counts
    count <- as.matrix(count) # Need to transform to matrix.
    
    coldata <- data.frame(row.names=colnames(count), condition)                    
    count_2 <- count[which(rowSums(count) > 0), ] # Exclude the gene which is 0 in all cells.
    count <-  count_2+1
    dds <- DESeqDataSetFromMatrix(count,coldata,design=~condition)
    dds <- DESeq(dds)
    res <- results(dds)
    outputname <- opt$output_name
    
    print("Finished to calculate DEGs and now writing the DEGs results.")
    write.csv(res, outputname)
    
    timeend<-Sys.time()
    print("The time ending:")
    print(timeend)
    runningtime<-timeend-timestart
    print("The time DESeq2 used is ")
    print(runningtime) 
    
    2.使用

    该函数的使用就是在Linux上直接输入参数就可以了,但是之前你需要构建一个rds object, 其中两个变量是counts和label,构建如下:

    rm(list = ls())
    list.files()
    library(Seurat)
    Female_exp  <- readRDS("20210305_Female_DESeq2_Exp.rds")
    all <- list()
    all$counts <- Female_exp$counts
    all$label <- Female_exp$sample_label
    print(table(all$label))
    ## DESeq2中会把第一个level 作为control,第二个level作为treat.这个需要确认好,是谁对谁找的差异基因
    saveRDS(all , "Female_allcells.rds")
    

    然后就可以挂在后台上跑了

    i=Female_allcells.rds
    nohup RunDESeq2_Linux.R -o . --rds_name $i --output_name $i.csv > $i.output.file 2>&1 &
    

    进一步有一步的欢喜!!
    今天是前进的一天

    相关文章

      网友评论

        本文标题:DESeq2跑差异基因

        本文链接:https://www.haomeiwen.com/subject/attdqltx.html