本文是《ATAC-Seq 分析流程》的补充,解释了不要自己手动将 bedgraph-ish 格式文件转换到 bigwig 的原因。
Genrich -k
参数文件格式示例。
# experimental file: /Example/WT1_ATAC.bam; control file: NA
chr start end experimental control -log(p)
chr1 0 9944 0.000000 2.238139 0.000000
chr1 9944 9946 1.000000 2.238139 0.188209
chr1 9946 9947 2.200000 2.238139 0.488245
其中信号背景 "control" 列的值计算方法为
The background pileup value is calculated by dividing the total sequence information (sum of read/fragment/interval lengths) in the experimental sample by the calculated genome length. The net control pileup value at a particular genomic position is the maximum of the background pileup value and the pileup of the control sample at that position (if a control sample is specified). Note that control pileups are scaled to match the experimental, based on the total sequence information in each.
简单总结就是根据测序深度(文库大小)计算,背景强度跟总体测序深度会成正比。
Genrich -k
参数文件 "control" 列总结。
$ awk '{print$5}' KO_pileup_p.bed | sort | uniq
0.000000
0.866656
0.900041
control
$ awk '{print$5}' WT_pileup_p.bed | sort | uniq
0.000000
2.114438
2.238139
control
这里 WT 组样本因为测序深度更高,所以背景的值更大,KO 更小。问题就出在 cut
命令取前 4 列信息最后转换到 bigwig 文件,这会导致有些区域在 KO 鉴定为有峰,在 WT 鉴定没有峰。但是在生成的 bigwig 文件里,WT 样本 "experimental" 列的信号值比 KO 更强,可视化这部分区域时就会发现峰图跟软件结论不一致。
以区域 chr3:170386225-170386419 为例,Genrich 软件鉴定 KO 组在这里有峰,而 WT 组没有。这段区域 -k
参数产生的 Pileup 文件信息如下。
# KO 组样本
chr start end experimental control -log(p)
chr3 170386225 170386227 4.700000 0.866656 1.926705
chr3 170386227 170386267 3.700000 0.866656 1.652342
chr3 170386267 170386268 4.700000 0.866656 1.926705
chr3 170386268 170386279 4.200000 0.866656 1.794608
chr3 170386279 170386286 3.200000 0.866656 1.497904
chr3 170386286 170386294 4.200000 0.866656 1.794608
chr3 170386294 170386310 5.200000 0.866656 2.050165
chr3 170386310 170386311 6.200000 0.866656 2.275715
chr3 170386311 170386312 6.533333 0.866656 2.345520
chr3 170386312 170386319 7.533333 0.866656 2.541555
chr3 170386319 170386320 8.533333 0.866656 2.720584
chr3 170386320 170386325 8.700000 0.866656 2.748994
chr3 170386325 170386359 8.500000 0.866656 2.714856
chr3 170386359 170386367 9.500000 0.866656 2.880330
chr3 170386367 170386386 8.500000 0.866656 2.714856
chr3 170386386 170386394 7.500000 0.866656 2.535314
chr3 170386394 170386399 6.500000 0.866656 2.338648
chr3 170386399 170386410 5.500000 0.866656 2.120604
chr3 170386410 170386411 4.500000 0.866656 1.874980
chr3 170386411 170386412 4.166667 0.866656 1.785459
chr3 170386412 170386419 3.166667 0.866656 1.487114
chr3 170386419 170386420 2.166667 0.866656 1.127307
# WT 组样本
chr start end experimental control -log(p)
chr3 170386225 170386228 9.400000 2.114438 1.697015
chr3 170386228 170386231 9.733333 2.114438 1.736043
chr3 170386231 170386234 10.733333 2.114438 1.848394
chr3 170386234 170386235 9.733333 2.114438 1.736043
chr3 170386235 170386256 8.733333 2.114438 1.616348
chr3 170386256 170386259 8.983333 2.114438 1.647022
chr3 170386259 170386261 7.483334 2.114438 1.454529
chr3 170386261 170386268 6.483334 2.114438 1.313350
chr3 170386268 170386273 5.483334 2.114438 1.159408
chr3 170386273 170386282 5.033333 2.114438 1.085210
chr3 170386282 170386285 8.533333 2.114438 1.591427
chr3 170386285 170386292 9.033333 2.114438 1.653094
chr3 170386292 170386299 10.033333 2.114438 1.770472
chr3 170386299 170386305 11.033333 2.114438 1.880816
chr3 170386305 170386311 12.033333 2.114438 1.985027
chr3 170386311 170386318 12.366667 2.114438 2.018532
chr3 170386318 170386319 12.166667 2.114438 1.998500
chr3 170386319 170386320 12.366667 2.114438 2.018532
chr3 170386320 170386321 11.366667 2.114438 1.916196
chr3 170386321 170386328 10.366667 2.114438 1.807986
chr3 170386328 170386329 10.033333 2.114438 1.770472
chr3 170386329 170386331 10.366667 2.114438 1.807986
chr3 170386331 170386337 9.366667 2.114438 1.693066
chr3 170386337 170386339 9.700000 2.114438 1.732178
chr3 170386339 170386356 10.700000 2.114438 1.844757
chr3 170386356 170386357 10.200000 2.114438 1.789324
chr3 170386357 170386379 11.200000 2.114438 1.898589
chr3 170386379 170386382 12.200000 2.114438 2.001853
chr3 170386382 170386385 8.700000 2.114438 1.612219
chr3 170386385 170386392 8.200000 2.114438 1.549103
chr3 170386392 170386399 7.200000 2.114438 1.415684
chr3 170386399 170386405 6.200000 2.114438 1.271142
chr3 170386405 170386411 5.200000 2.114438 1.113082
chr3 170386411 170386419 4.866666 2.114438 1.056857
chr3 170386419 170386425 4.666667 2.114438 1.022170
可以看到各自对照自己的 "control" 时,KO 组是更加显著有峰的,但是 "experimental" 绝对值是 WT 组更高的。所以 cut
命令取了 "experimental" 列生成 bigwig 文件进行可视化,会出现 WT 组峰信号更强,跟软件结论不一致。
网友评论