MACS2 parameters

作者: 余绕 | 来源:发表于2021-12-04 15:15 被阅读0次

MACS2 parameters
MACS2 参数详细学习
第3篇：用MACS2软件call peaks
表达式语法
关于一些前端的面试题
iOS打印AFNetworking中Error错误信息
swift3.0 之闭包(Closure)
如何使用MACS进行peak calling
安卓相机参数，翻转90度问题
Peak annotation

MACS2 parameters

There are seven major functions available in MACS2 serving as sub-commands. We will only cover callpeak in this lesson, but you can use macs2 COMMAND -h to find out more, if you are interested.

callpeak is the main function in MACS2 and can be invoked by typing macs2 callpeak. If you type this command without parameters, you will see a full description of commandline options. Here is a shorter list of the commonly used ones:

Input file options

-t: The IP data file (this is the only REQUIRED parameter for MACS)
-c: The control or mock data file
-f: format of input file; Default is “AUTO” which will allow MACS to decide the format automatically.
-g: mappable genome size which is defined as the genome size which can be sequenced; some precompiled values provided.

Output arguments

--outdir: MACS2 will save all output files into speficied folder for this option
-n: The prefix string for output files
-B/--bdg: store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files

Shifting model arguments

-s: size of sequencing tags. Default, MACS will use the first 10 sequences from your input treatment file to determine it
--bw: The bandwidth which is used to scan the genome ONLY for model building. Can be set to the expected sonication fragment size.
--mfold: upper and lower limit for model building

Peak calling arguments

-q: q-value (minimum FDR) cutoff
-p: p-value cutoff (instead of q-value cutoff)
--nolambda: do not consider the local bias/lambda at peak candidate regions
--broad: broad peak calling

NOTE: Relaxing the q-value does not behave as expected in this case since it is partially tied to peak widths. Ideally, if you relaxed the thresholds, you would simply get more peaks but with MACS2 relaxing thresholds also results in wider peaks.

Now that we have a feel for the different ways we can tweak our command, let’s set up the command for our run on Nanog-rep1:

$ macs2 callpeak -t bowtie2/H1hesc_Nanog_Rep1_aln.bam \
    -c bowtie2/H1hesc_Input_Rep1_aln.bam \
    -f BAM -g 1.3e+8 \
    -n Nanog-rep1 \
    --outdir macs2

The tool is quite verbose so you should see lines of text being printed to the terminal, describing each step that is being carried out. If that runs successfully, go ahead and re-run the same command but this time let’s capture that information into a log file using 2> to re-direct the stadard error to file:

$ macs2 callpeak -t bowtie2/H1hesc_Nanog_Rep1_aln.bam \
    -c bowtie2/H1hesc_Input_Rep1_aln.bam \
    -f BAM -g 1.3e+8 \
    -n Nanog-rep1 \
    --outdir macs2 2> macs2/Nanog-rep1-macs2.log

Ok, now let’s do the same peak calling for the rest of our samples:

macs2 callpeak -t bowtie2/H1hesc_Nanog_Rep2_aln.bam -c bowtie2/H1hesc_Input_Rep2_aln.bam -f BAM -g 1.3e+8 --outdir macs2 -n Nanog-rep2 2> macs2/Nanog-rep2-macs2.log
 
macs2 callpeak -t bowtie2/H1hesc_Pou5f1_Rep1_aln.bam -c bowtie2/H1hesc_Input_Rep1_aln.bam -f BAM -g 1.3e+8 --outdir macs2 -n Pou5f1-rep1 2> macs2/Pou5f1-rep1-macs2.log
     
macs2 callpeak -t bowtie2/H1hesc_Pou5f1_Rep2_aln.bam -c bowtie2/H1hesc_Input_Rep2_aln.bam -f BAM -g 1.3e+8 --outdir macs2 -n Pou5f1-rep2 2> macs2/Pou5f1-rep2-macs2.log

MACS2 Output files

Now, there should be 6 files output to the results directory for each of the 4 samples, so a total of 24 files:

* `_peaks.narrowPeak`: BED6+4 format file which contains the peak locations together with peak summit, pvalue and qvalue

* `_peaks.xls`: a tabular file which contains information about called peaks. Additional information includes pileup and fold enrichment

* `_summits.bed`: peak summits locations for every peak. To find the motifs at the binding sites, this file is recommended

_model.R: an R script which you can use to produce a PDF image about the model based on your data and cross-correlation plot
_control_lambda.bdg: bedGraph format for input sample
_treat_pileup.bdg: bedGraph format for treatment sample

Let’s first obtain a summary of how many peaks were called in each sample. We can do this by counting the lines in the .narrowPeak files:

$ wc -l *.narrowPeak

We can also generate plots using the R script file that was output by MACS2. There is a _model.R script in the directory. Let’s load the R module and run the R script in the command line using the Rscript command as demonstrated below:

$ module load gcc/6.2.0 R/3.4.1

$ Rscript Nanog-rep1_model.r

NOTE: We need to load the gcc/6.2.0 before loading R. You can find out which modules need to be loaded first by using module spider R/3.4.1`

Now you should see a pdf file in your current directory by the same name. Create the plots for each of the samples and move them over to your laptop using Filezilla.

Open up the pdf file for Nanog-rep1. The first plot illustrates the distance between the modes from which the shift size was determined.

image

The second plot is the cross-correlation plot. This is a graphical representation of the Pearson correlation of positive- and negative- strand tag densities, shifting the strands relative to each other by increasing distance. We will talk about this in more detail in the next lesson.

NOTE: SPP is another very commonly used tool for narrow peak calling. While we will not be going through the steps for this peak caller in this workshop, we do have a lesson on SPP that we encourage you to browse through if you are interested in learning more.

Data source： Peak calling with MACS2 | Introduction to ChIP-Seq using high-performance computing (hbctraining.github.io)

网友评论

本文标题：MACS2 parameters

本文链接：https://www.haomeiwen.com/subject/apuosltx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

MACS2 parameters