ATAC-Seq 差异分析 DiffBind

作者: BeeBee生信 | 来源:发表于2020-12-01 10:34 被阅读0次

    本文是《ATAC-Seq 分析流程》拆分出来的 DiffBind 差异分析部分。DiffBind 有许多的 Normalization 方法,建议使用前先看文档了解清楚,这里只提供简单的分析步骤,只用默认参数。

    DiffBind 借助 RNA-seq 分析差异基因思路。软件首先将所有样本的峰合并(bedtools merge),得到总的峰区域信息,再计算每个峰区域 Reads 数,最后调用 DESeq2/edgeR 进行差异分析。

    library(DiffBind, quietly = TRUE)
    library(tidyverse, quietly = TRUE)
    

    读入数据,没有的信息就填 NA. 不会有影响。

    > samples <- read.csv("Samples.csv", header = TRUE)
    > samples
      SampleID Tissue Factor Condition Treatment Replicate
    1      KO1     NA     NA        NA        KO         1
    2      KO2     NA     NA        NA        KO         2
    3      WT1     NA     NA        NA        WT         1
    4      WT2     NA     NA        NA        WT         2
                                                                      bamReads
    1 /Example/KO1_ATAC_ShiftSort.bam
    2 /Example/KO2_ATAC_ShiftSort.bam
    3 /Example/WT1_ATAC_ShiftSort.bam
    4 /Example/WT2_ATAC_ShiftSort.bam
      ControlID bamControl
    1        NA         NA
    2        NA         NA
    3        NA         NA
    4        NA         NA
                                                                       Peaks
    1 /Example/KO1_ATAC_peaks.narrowPeak
    2 /Example/KO2_ATAC_peaks.narrowPeak
    3 /Example/WT1_ATAC_peaks.narrowPeak
    4 /Example/WT2_ATAC_peaks.narrowPeak
      PeakCaller
    1     narrow
    2     narrow
    3     narrow
    4     narrow
    

    构建 dba 对象。

    > sampleDba <- dba(sampleSheet = samples)
    KO1    KO 1 narrow
    KO2    KO 2 narrow
    WT1    WT 1 narrow
    WT2    WT 2 narrow
    

    计算 count, 然后会给出 FRiP 数值。

    sampleCount <- dba.count(sampleDba)
    > sampleCount
    4 Samples, 9127 sites in matrix:
       ID Treatment Replicate    Reads FRiP
    1 KO1        KO         1 12317324 0.01
    2 KO2        KO         2 11922986 0.01
    3 WT1        WT         1 30846990 0.02
    4 WT2        WT         2 29012696 0.02
    

    进行均一化,然后分析差异的 Peaks.

    sampleNor <- dba.normalize(sampleCount)
    sampleCon <- dba.contrast(sampleNor, design = ~ Treatment, contrast = c("Treatment", "WT", "KO"), reorderMeta = list(Treatment="WT"))
    sampleDiff <- dba.analyze(sampleCon, bBlacklist = FALSE, bGreylist = FALSE)
    
    > dba.show(sampleDiff, bContrasts = TRUE)
         Factor Group Samples Group2 Samples2 DB.DESeq2
    1 Treatment    WT       2     KO        2      6293
    

    导出文本的结果

    diffPeak <- dba.report(sampleDiff)
    > diffPeak
    GRanges object with 6293 ranges and 6 metadata columns:
           seqnames              ranges strand |      Conc   Conc_WT   Conc_KO
              <Rle>           <IRanges>  <Rle> | <numeric> <numeric> <numeric>
      1041    chr10   74826166-74826566      * |      6.69      7.44      5.00
      7544     chr6 155314400-155314800      * |      6.61      7.38      4.84
      8254     chr8   73972212-73972612      * |      7.19      7.82      6.02
      6555     chr5   91383243-91383643      * |      6.84      7.54      5.41
      7830     chr7   94657864-94658264      * |      6.47      7.24      4.70
       ...      ...                 ...    ... .       ...       ...       ...
      3936    chr19   35612533-35612933      * |      4.46      4.91      3.79
      4659     chr2 175181550-175181950      * |      4.04      4.55      3.25
      1275    chr11   13463100-13463500      * |      4.64      5.06      4.04
      8757     chr9   97983046-97983446      * |      5.10      5.47      4.60
      6553     chr5   91280440-91280840      * |      5.28      5.62      4.84
                Fold   p-value       FDR
           <numeric> <numeric> <numeric>
      1041      2.34  1.11e-15  1.02e-11
      7544      2.42  4.96e-15  2.26e-11
      8254      1.76  8.53e-15  2.60e-11
      6555      2.06  4.11e-14  9.38e-11
      7830      2.41  1.21e-13  2.20e-10
       ...       ...       ...       ...
      3936      0.99    0.0344    0.0499
      4659      1.10    0.0344    0.0499
      1275      0.91    0.0344    0.0500
      8757      0.81    0.0344    0.0500
      6553      0.73    0.0345    0.0500
      -------
      seqinfo: 23 sequences from an unspecified genome; no seqlengths
    

    还有些函数可以出一些简单的图。

    相关文章

      网友评论

        本文标题:ATAC-Seq 差异分析 DiffBind

        本文链接:https://www.haomeiwen.com/subject/cvnjwktx.html