美文网首页
MACS2 parameters

MACS2 parameters

作者: 余绕 | 来源:发表于2021-12-04 15:15 被阅读0次

    MACS2 parameters

    There are seven major functions available in MACS2 serving as sub-commands. We will only cover callpeak in this lesson, but you can use macs2 COMMAND -h to find out more, if you are interested.

    callpeak is the main function in MACS2 and can be invoked by typing macs2 callpeak. If you type this command without parameters, you will see a full description of commandline options. Here is a shorter list of the commonly used ones:

    Input file options

    • -t: The IP data file (this is the only REQUIRED parameter for MACS)
    • -c: The control or mock data file
    • -f: format of input file; Default is “AUTO” which will allow MACS to decide the format automatically.
    • -g: mappable genome size which is defined as the genome size which can be sequenced; some precompiled values provided.

    Output arguments

    • --outdir: MACS2 will save all output files into speficied folder for this option
    • -n: The prefix string for output files
    • -B/--bdg: store the fragment pileup, control lambda, -log10pvalue and -log10qvalue scores in bedGraph files

    Shifting model arguments

    • -s: size of sequencing tags. Default, MACS will use the first 10 sequences from your input treatment file to determine it
    • --bw: The bandwidth which is used to scan the genome ONLY for model building. Can be set to the expected sonication fragment size.
    • --mfold: upper and lower limit for model building

    Peak calling arguments

    • -q: q-value (minimum FDR) cutoff
    • -p: p-value cutoff (instead of q-value cutoff)
    • --nolambda: do not consider the local bias/lambda at peak candidate regions
    • --broad: broad peak calling

    NOTE: Relaxing the q-value does not behave as expected in this case since it is partially tied to peak widths. Ideally, if you relaxed the thresholds, you would simply get more peaks but with MACS2 relaxing thresholds also results in wider peaks.

    Now that we have a feel for the different ways we can tweak our command, let’s set up the command for our run on Nanog-rep1:

    $ macs2 callpeak -t bowtie2/H1hesc_Nanog_Rep1_aln.bam \
        -c bowtie2/H1hesc_Input_Rep1_aln.bam \
        -f BAM -g 1.3e+8 \
        -n Nanog-rep1 \
        --outdir macs2
    

    The tool is quite verbose so you should see lines of text being printed to the terminal, describing each step that is being carried out. If that runs successfully, go ahead and re-run the same command but this time let’s capture that information into a log file using 2> to re-direct the stadard error to file:

    $ macs2 callpeak -t bowtie2/H1hesc_Nanog_Rep1_aln.bam \
        -c bowtie2/H1hesc_Input_Rep1_aln.bam \
        -f BAM -g 1.3e+8 \
        -n Nanog-rep1 \
        --outdir macs2 2> macs2/Nanog-rep1-macs2.log
    

    Ok, now let’s do the same peak calling for the rest of our samples:

    macs2 callpeak -t bowtie2/H1hesc_Nanog_Rep2_aln.bam -c bowtie2/H1hesc_Input_Rep2_aln.bam -f BAM -g 1.3e+8 --outdir macs2 -n Nanog-rep2 2> macs2/Nanog-rep2-macs2.log
     
    macs2 callpeak -t bowtie2/H1hesc_Pou5f1_Rep1_aln.bam -c bowtie2/H1hesc_Input_Rep1_aln.bam -f BAM -g 1.3e+8 --outdir macs2 -n Pou5f1-rep1 2> macs2/Pou5f1-rep1-macs2.log
         
    macs2 callpeak -t bowtie2/H1hesc_Pou5f1_Rep2_aln.bam -c bowtie2/H1hesc_Input_Rep2_aln.bam -f BAM -g 1.3e+8 --outdir macs2 -n Pou5f1-rep2 2> macs2/Pou5f1-rep2-macs2.log
    

    MACS2 Output files

    Now, there should be 6 files output to the results directory for each of the 4 samples, so a total of 24 files:

    * _peaks.narrowPeak: BED6+4 format file which contains the peak locations together with peak summit, pvalue and qvalue
    * _peaks.xls: a tabular file which contains information about called peaks. Additional information includes pileup and fold enrichment
    * _summits.bed: peak summits locations for every peak. To find the motifs at the binding sites, this file is recommended
    • _model.R: an R script which you can use to produce a PDF image about the model based on your data and cross-correlation plot
    • _control_lambda.bdg: bedGraph format for input sample
    • _treat_pileup.bdg: bedGraph format for treatment sample

    Let’s first obtain a summary of how many peaks were called in each sample. We can do this by counting the lines in the .narrowPeak files:

    $ wc -l *.narrowPeak
    
    

    We can also generate plots using the R script file that was output by MACS2. There is a _model.R script in the directory. Let’s load the R module and run the R script in the command line using the Rscript command as demonstrated below:

    $ module load gcc/6.2.0 R/3.4.1
    
    $ Rscript Nanog-rep1_model.r
    
    

    NOTE: We need to load the gcc/6.2.0 before loading R. You can find out which modules need to be loaded first by using module spider R/3.4.1`

    Now you should see a pdf file in your current directory by the same name. Create the plots for each of the samples and move them over to your laptop using Filezilla.

    Open up the pdf file for Nanog-rep1. The first plot illustrates the distance between the modes from which the shift size was determined.

    image

    The second plot is the cross-correlation plot. This is a graphical representation of the Pearson correlation of positive- and negative- strand tag densities, shifting the strands relative to each other by increasing distance. We will talk about this in more detail in the next lesson.

    NOTE: SPP is another very commonly used tool for narrow peak calling. While we will not be going through the steps for this peak caller in this workshop, we do have a lesson on SPP that we encourage you to browse through if you are interested in learning more.

    Data source: Peak calling with MACS2 | Introduction to ChIP-Seq using high-performance computing (hbctraining.github.io)

    相关文章

      网友评论

          本文标题:MACS2 parameters

          本文链接:https://www.haomeiwen.com/subject/apuosltx.html