本文是《ATAC-Seq 分析流程》拆分出来的 DiffBind 差异分析部分。DiffBind 有许多的 Normalization 方法,建议使用前先看文档了解清楚,这里只提供简单的分析步骤,只用默认参数。
DiffBind 借助 RNA-seq 分析差异基因思路。软件首先将所有样本的峰合并(bedtools merge
),得到总的峰区域信息,再计算每个峰区域 Reads 数,最后调用 DESeq2/edgeR 进行差异分析。
library(DiffBind, quietly = TRUE)
library(tidyverse, quietly = TRUE)
读入数据,没有的信息就填 NA
. 不会有影响。
> samples <- read.csv("Samples.csv", header = TRUE)
> samples
SampleID Tissue Factor Condition Treatment Replicate
1 KO1 NA NA NA KO 1
2 KO2 NA NA NA KO 2
3 WT1 NA NA NA WT 1
4 WT2 NA NA NA WT 2
bamReads
1 /Example/KO1_ATAC_ShiftSort.bam
2 /Example/KO2_ATAC_ShiftSort.bam
3 /Example/WT1_ATAC_ShiftSort.bam
4 /Example/WT2_ATAC_ShiftSort.bam
ControlID bamControl
1 NA NA
2 NA NA
3 NA NA
4 NA NA
Peaks
1 /Example/KO1_ATAC_peaks.narrowPeak
2 /Example/KO2_ATAC_peaks.narrowPeak
3 /Example/WT1_ATAC_peaks.narrowPeak
4 /Example/WT2_ATAC_peaks.narrowPeak
PeakCaller
1 narrow
2 narrow
3 narrow
4 narrow
构建 dba
对象。
> sampleDba <- dba(sampleSheet = samples)
KO1 KO 1 narrow
KO2 KO 2 narrow
WT1 WT 1 narrow
WT2 WT 2 narrow
计算 count, 然后会给出 FRiP 数值。
sampleCount <- dba.count(sampleDba)
> sampleCount
4 Samples, 9127 sites in matrix:
ID Treatment Replicate Reads FRiP
1 KO1 KO 1 12317324 0.01
2 KO2 KO 2 11922986 0.01
3 WT1 WT 1 30846990 0.02
4 WT2 WT 2 29012696 0.02
进行均一化,然后分析差异的 Peaks.
sampleNor <- dba.normalize(sampleCount)
sampleCon <- dba.contrast(sampleNor, design = ~ Treatment, contrast = c("Treatment", "WT", "KO"), reorderMeta = list(Treatment="WT"))
sampleDiff <- dba.analyze(sampleCon, bBlacklist = FALSE, bGreylist = FALSE)
> dba.show(sampleDiff, bContrasts = TRUE)
Factor Group Samples Group2 Samples2 DB.DESeq2
1 Treatment WT 2 KO 2 6293
导出文本的结果
diffPeak <- dba.report(sampleDiff)
> diffPeak
GRanges object with 6293 ranges and 6 metadata columns:
seqnames ranges strand | Conc Conc_WT Conc_KO
<Rle> <IRanges> <Rle> | <numeric> <numeric> <numeric>
1041 chr10 74826166-74826566 * | 6.69 7.44 5.00
7544 chr6 155314400-155314800 * | 6.61 7.38 4.84
8254 chr8 73972212-73972612 * | 7.19 7.82 6.02
6555 chr5 91383243-91383643 * | 6.84 7.54 5.41
7830 chr7 94657864-94658264 * | 6.47 7.24 4.70
... ... ... ... . ... ... ...
3936 chr19 35612533-35612933 * | 4.46 4.91 3.79
4659 chr2 175181550-175181950 * | 4.04 4.55 3.25
1275 chr11 13463100-13463500 * | 4.64 5.06 4.04
8757 chr9 97983046-97983446 * | 5.10 5.47 4.60
6553 chr5 91280440-91280840 * | 5.28 5.62 4.84
Fold p-value FDR
<numeric> <numeric> <numeric>
1041 2.34 1.11e-15 1.02e-11
7544 2.42 4.96e-15 2.26e-11
8254 1.76 8.53e-15 2.60e-11
6555 2.06 4.11e-14 9.38e-11
7830 2.41 1.21e-13 2.20e-10
... ... ... ...
3936 0.99 0.0344 0.0499
4659 1.10 0.0344 0.0499
1275 0.91 0.0344 0.0500
8757 0.81 0.0344 0.0500
6553 0.73 0.0345 0.0500
-------
seqinfo: 23 sequences from an unspecified genome; no seqlengths
还有些函数可以出一些简单的图。
网友评论