3.2 第二种方式MAnorm差异分析-一款寻找两个ChIP-Seq样本之间差异peak的软件
软件网址:https://manorm.readthedocs.io/en/latest/usage.html#id7
使用参考:https://www.jianshu.com/p/a1a17c42946f
通过比较两个样品的common peak的density差异,标准化unique peaks,也就是说,既然两个样本间common peak强度一致,那么peak内的reads差异倍数就是测序深度/密度的差异,能够作为normalization的标准。直接比较标准化后的peaks,避免了不同样品信噪比不同的问题。这个算法基于这样的假设:两个样本间都有的 peak 或是 banding 位点,相关蛋白的结合机制相同,故应有相同的 binding intensity。
3.2.1 MAnorm安装
git clone https://github.com/shao-lab/MAnorm.git #安装最新版本
cd MAnorm
pip install . #注意.不要漏掉!
manorm --version ##检查一下是否安装成功,我的显示1.3.0
#注意:我是在conda中chipseq环境中安装的,所以我要在chipseq环境中应用
3.2.2 MAnorm使用
manorm --p1 peaks_file1.bed --p2 peaks_file2.bed --pf macs3 --r1 reads_file1.bed --r2 reads_file2.bed
--rf bam --n1 name1 --n2 name2 -o output_dir
#--p1 Peak file of sample 1. (可用MCAS3的结果,如sample1_peaks.xls)
#--p2 Peak file of sample 2
#--pf, --peak-format Format of the peak files. Default: bed,我们用的macs结果,所用这里是macs
#--r1 Read file of sample 1.
#--r2 Read file of sample 2.
#--rf Format of the read files. Default: bed
#--n1 Name of sample 1.
#--n2 Name of sample 2.
#-o output_dir #指定输出文件路径
#peak和Read文件格式https://manorm.readthedocs.io/en/latest/usage.html#peak-file-formats
#额外的参数:--s1, --shiftsize1 Single-end reads shiftsize of sample 1. Default: 100
--s2, --shiftsize2 Single-end reads shiftsize of sample 2. Default: 100
--pe, --paired-end Paired-end mode.
-w, --window-size Window size to count reads and calculate read densities. Default: 2000
--summit-dis Summit-to-summit distance cutoff for common peaks. Default: -w/4
--n-random Number of simulations to test the enrichment of peaks overlap between two samples.
-m, --m-cutoff Absolute M value (log2-ratio) cutoff to define biased (differential binding) peaks.
-p, --p-cutoff P value cutoff to define biased peaks.
--wa, --write-all Output additional files which contains the results of original (unmerged) peaks.
3.2.2 MAnorm结果文件
1. <name1>_vs_<name2>_all_MAvalues.xls #主要结果文件
This is the main output result of MAnorm which contains the M-A values and normalized read densities of each peak, common peaks from two samples are merged together.
chr: chromosome name
start: start position of the peak
end: end position of the peak
summit: summit position of the peak (absolute position)
m_value: M value (log2 fold change) of normalized read densities under comparison
a_value: A value (average signal strength) of normalized read densities under comparison
p_value
peak_group: indicates where the peak is come from and whether it is a common peak
normalized_read_density_in_<name1>
normalized_read_density_in_<name2>
Coordinates in .xls file is under 1-based coordinate-system.\
2.output_filters/
This folder contains the filtered biased/unbiased peaks in BED format.
<name1>_vs_<name2>_M_above_<m_cutoff>_biased_peaks.bed
<name1>_vs_<name2>_M_below_-<m_cutoff>_biased_peaks.bed
<name1>_vs_<name2>_unbiased_peaks.bed
3. output_tracks/
These files are genome track files for M values, A values and P values in wiggle format, you can load these files into a genome browser for visualization.
<name1>_vs_<name2>_M_values.wig
<name1>_vs_<name2>_A_values.wig
<name1>_vs_<name2>_P_values.wig
4.output_figures/
This folder contains M-A plots before/after normalization and a scatter plot which shows the scaling relationship between two samples.
<name1>_vs_<name2>_read_density_on_common_peaks.pdf
<name1>_vs_<name2>_MA_plot_before_normalization.pdf
<name1>_vs_<name2>_MA_plot_after_normalization.pdf
<name1>_vs_<name2>_MA_plot_with_P_value.pdf
网友评论