美文网首页转录组分析RNA-seq基因注释/富集分析与功能分类
RNASEQ分析入门笔记8-使用DESeq2进行表达差异分析

RNASEQ分析入门笔记8-使用DESeq2进行表达差异分析

作者: 75d3f95b058e | 来源:发表于2018-04-06 12:24 被阅读3042次

    基因的差异表达分析,通常使用R中的软件包,包括:DESeq2,edgeR,limma等,今天介绍DESeq2的分析流程:

    1、在R中安装DESeq2软件包

    source ("http://bioconductor.org/biocLite.R") # 调入安装工具Bioconductor
    biocLite("DESeq2") # 安装DESeq2
    library("DESeq2") # 测试是否安装成功
    

    如果安装出错,请移步前文:《MAC安装DESeq2报错及解决方案》,作为一个生信小白,我花了3天时间才把安装DESeq2的问题解决,一路全是坑啊!


    2、导入分析数据

    我们还是使用《RNASEQ分析入门笔记7-HTSeq定量基因表达水平》中得到的数据进行分析,首先回顾一下:

    上文中,我们得到了raw_count_filt2矩阵,格式如下(执行:tail (raw_count_filt2, n=5):

    control1 control2 rep1 rep2
    ENSMUSG00000110420 0 0 0 0
    ENSMUSG00000110421 0 0 0 1
    ENSMUSG00000110422 1 0 1 2
    ENSMUSG00000110423 0 0 0 0
    ENSMUSG00000110424 26 20 48 24

    可以看到,该矩阵实际只有4列,是由整数组成的,而最前面的一列是行名,可以直接从保存的.csv文件中导入:

    raw_count_filt2 <- read.csv ("raw_count_filt2.csv")  # 从保存的.csv文件导入数据,并赋值给raw_count_filt2
    

    3、加载DESeq2并设置样品信息

    library(DESeq2) # 加载DESeq2包
    countData <- raw_count_filt2 # 表达矩阵
    condition <- factor(c("control","control","KD","KD")) # 定义condition
    colData <- data.frame(row.names=colnames(countData), condition) # 样品信息矩阵
    

    condition:

    [1] control control KD KD
    Levels: KD control

    colData:

    condition
    control1 control
    control2 control
    rep1 KD
    rep2 KD

    4、构建dds矩阵

    dds <- DESeqDataSetFromMatrix(countData, DataFrame(condition), design= ~ condition ) # 构建dds矩阵
    head(dds) # 查看dds矩阵的前6行
    

    class: DESeqDataSet
    dim: 6 4
    metadata(1): version
    assays(1): counts
    rownames(6): ENSMUSG00000000001 ENSMUSG00000000003 ...
    ENSMUSG00000000037 ENSMUSG00000000049
    rowData names(0):
    colnames(4): control1 control2 rep1 rep2
    colData names(1): condition

    dds2 <- DESeq(dds) # 对dds进行Normalize
    

    运行成功会有如下提示:

    estimating size factors
    estimating dispersions
    gene-wise dispersion estimates
    mean-dispersion relationship
    final dispersion estimates
    fitting model and testing

    resultsNames(dds) # 查看结果的名称
    

    [1] "Intercept" "condition_control_vs_KD"

    res <- results(dds) # 使用results()函数获取结果,并赋值给res
    head (res, n=5) # 查看res矩阵的前5行
    

    输出的res矩阵有6列,分别是:baseMean,log2FoldChange,lfcSE,pvalue,padj:

    log2 fold change (MLE): condition control vs KD
    Wald test p-value: condition control vs KD
    DataFrame with 5 rows and 6 columns

    baseMean log2FoldChange lfcSE stat pvalue padj
    <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
    ENSMUSG00000000001 2359.75391 -0.08461265 0.2081281 -0.4065413 0.68434494 0.9040654
    ENSMUSG00000000003 0.00000 NA NA NA NA NA
    ENSMUSG00000000028 1027.82363 0.03638541 0.2036829 0.1786375 0.85822232 0.9618582
    ENSMUSG00000000031 65.37333 0.94588315 0.4428518 2.1358912 0.03268828 0.2511131
    ENSMUSG00000000037 69.75843 0.11587980 0.4400431 0.2633374 0.79229054 0.9422796
    mcols(res,use.names= TRUE) # 查看res矩阵每一列的含义
    

    DataFrame with 6 rows and 2 columns

    type description
    <character> <character>
    baseMean intermediate mean of normalized counts for all samples
    log2FoldChange results log2 fold change (MLE): condition control vs KD
    lfcSE results standard error: condition control vs KD
    stat results Wald statistic: condition control vs KD
    pvalue results Wald test p-value: condition control vs KD
    padj results BH adjusted p-values
    summary(res) # 对res矩阵进行总结
    

    out of 28335 with nonzero total read count
    adjusted p-value < 0.1
    LFC > 0 (up) : 445, 1.6%
    LFC < 0 (down) : 625, 2.2%
    outliers [1] : 0, 0%
    low counts [2] : 12683, 45%
    (mean count < 18)
    [1] see 'cooksCutoff' argument of ?results
    [2] see 'independentFiltering' argument of ?results

    其中,445个基因表达上调,625个基因表达下降,可靠程度是:p-value < 0.1


    5、提取差异分析结果

    table(res$padj<0.05) # 取padj小于0.05的数据,得到743行
    
    FALSE TRUE
    14909 743
    res <- res[order(res$padj),] # 按照padj的大小将res重新排列
    diff_gene_deseq2 <- subset(res,padj < 0.05 & (log2FoldChange >1 | log2FoldChange < -1)) # 获取padj小于0.05,表达倍数取以2为对数后绝对值大于1的差异表达基因,赋值给diff_gene_deseq2
    head (diff_gene_deseq2, n=5) # 查看diff_gene_deseq2矩阵的前5行
    

    log2 fold change (MLE): condition control vs KD
    Wald test p-value: condition control vs KD
    DataFrame with 5 rows and 6 columns

    baseMean log2FoldChange lfcSE stat pvalue padj
    <numeric> <numeric> <numeric> <numeric> <numeric> <numeric>
    ENSMUSG00000003309 548.1926 3.231612 0.2658125 12.157487 5.234431e-34 8.192931e-30
    ENSMUSG00000046323 404.1894 3.067051 0.2628220 11.669689 1.820880e-31 1.425021e-27
    ENSMUSG00000001123 341.8542 2.797485 0.2766499 10.112006 4.887336e-24 2.549886e-20
    ENSMUSG00000018569 485.4839 3.136032 0.3312999 9.465839 2.912140e-21 9.116163e-18
    ENSMUSG00000023906 951.9460 2.382308 0.2510718 9.488553 2.342631e-21 9.116163e-18
    diff_gene_deseq2 <- row.names(diff_gene_deseq2) # 提取diff_gene_deseq2的行名
    head (diff_gene_deseq2, n=5)
    

    [1] "ENSMUSG00000003309"
    [2] "ENSMUSG00000046323"
    [3] "ENSMUSG00000001123"
    [4] "ENSMUSG00000018569"
    [5] "ENSMUSG00000023906"

    resdata <- merge (as.data.frame(res),as.data.frame(counts(dds,normalize=TRUE)),by="row.names",sort=FALSE)
    head (resdata,n=5)
    
    Row.names baseMean log2FoldChange lfcSE stat pvalue padj control1 control2 rep1 rep2
    1 ENSMUSG00000003309 548.1926 3.231612 0.2658125 12.157487 5.234431e-34 8.192931e-30 1073.9977 908.7239 125.93083 84.11803
    2 ENSMUSG00000046323 404.1894 3.067051 0.2628220 11.669689 1.820880e-31 1.425021e-27 723.4828 721.5049 97.10329 74.66657
    3 ENSMUSG00000001123 341.8542 2.797485 0.2766499 10.112006 4.887336e-24 2.549886e-20 679.8243 515.6735 84.96538 86.95347
    4 ENSMUSG00000018569 485.4839 3.136032 0.3312999 9.465839 2.912140e-21 9.116163e-18 1113.9140 630.6325 127.44807 69.94083
    5 ENSMUSG00000023906 951.9460 2.382308 0.2510718 9.488553 2.342631e-21 9.116163e-18 1862.3445 1333.5250 351.99943 259.91526
    write.csv(resdata, file="control_vs_akap95.csv") # 将结果写入control_vs_akap95.csv文件
    

    相关文章

      网友评论

        本文标题:RNASEQ分析入门笔记8-使用DESeq2进行表达差异分析

        本文链接:https://www.haomeiwen.com/subject/iggthftx.html