美文网首页
Overlapping peaks--Bedtools

Overlapping peaks--Bedtools

作者: 余绕 | 来源:发表于2021-11-01 16:36 被阅读0次

Finding overlapping peaks between replicates

The bedtools intersect command within bedtools is the one we want to use, since it is able to report back the peaks that are overlapping with respect to a given file (the file designated as “a”).

image.png

Note: If you are trying to intersect very large files and are having trouble with excessive memory usage, please presort your data by chromosome and then by start position (e.g., sort -k1,1 -k2,2n in.bed > in.sorted.bed for BED files) and then use the -sorted option. This invokes a memory-efficient algorithm designed for large files. This algorithm has been substantially improved in recent (>=2.18.0) releases

参数

image.png

By default, if an overlap is found, bedtools intersect reports the shared interval between the two overlapping features.

$ cat A.bed
chr1  10  20
chr1  30  40

$ cat B.bed
chr1  15   20
#shared invervals
$ bedtools intersect -a A.bed -b B.bed
chr1  15   20

The intersect tool evaluates A (file 1) and finds regions that overlap in B (file 2). We will add the -wo which indicates to write the original A (file 1) and B (file 2) entries plus the number of base pairs of overlap between the two features.

Let’s start with the Nanog replicates:

$ bedtools intersect \
-a macs2/Nanog-rep1_peaks.narrowPeak \
-b macs2/Nanog-rep2_peaks.narrowPeak \
-wo > bedtools/Nanog-overlaps.bed

We’ll do the same for the Pou5f1 replicates:

$ bedtools intersect \
-a macs2/Pou5f1-rep1_peaks.narrowPeak \
-b macs2/Pou5f1-rep2_peaks.narrowPeak \
-wo > bedtools/Pou5f1-overlaps.bed

Historical Note: “A simpler heuristic for establishing reproducibility was previously used as a standard for depositing ENCODE data and was in effect when much of the currently available data was submitted. According to this standard, either 80% of the top 40% of the peaks identified from one replicate using an acceptable scoring method should overlap the list of peaks from the other replicate, OR peak lists scored using all available reads from each replicate should share more than 75% of regions in common. As with the current standards, this was developed based on experience with accumulated ENCODE ChIP-seq data, albeit with a much smaller sample size.” [ENCODE Guidelines, Landt et al, 2012]

Ref: https://hbctraining.github.io/Intro-to-ChIPseq/lessons/handling-replicates-bedtools.html

相关文章

网友评论

      本文标题:Overlapping peaks--Bedtools

      本文链接:https://www.haomeiwen.com/subject/fnizaltx.html