美文网首页
hbctraining-Introduction to ChIP

hbctraining-Introduction to ChIP

作者: horsefish | 来源:发表于2018-12-20 20:49 被阅读0次

Align and Filtering


Part 1. Align

1.Alignment to Genome

    After we have assessed the clean sequence data, we are ready to align the reads to the reference genome. Bowtie2 is a fast and accurate alignment tools that indexes the genome with an FM index based on the Burrows-Wheeler transform method to keep memory requirements low for the alignment process. Bowtie2 supports gapped, local and paired end alignment modes and works best for reads that are at least 50bp (shorter read lengths should use Bowtie1, like smRNA-Seq). By default, Bowtie2 will perform a global end-to-end read alignment, which is best for quality-trimmed reads. However, it also has a local alignment mode, which will perform soft-clipping for the removal of poor quality bases or adapters from untrimmed reads

2. Bowtie2 Usage

* Creating a Bowtie2 index

        Genome index , analagous to the index in the back of a book,is required to perform          the  Bowtie2 alignment. We can generate the genome index by the following command:

        bowtie2-build<path_to_reference_genome.fa><prefix_to_name_indexes>

* often-used parameters in Bowtie2

     -p: number of processors/cores

     -q: reads that are in FASTQ format

     --local: local alignment feature to perform soft-clipping

     -x: /path/to/genome_index_directory

     -S: /path/to/output/SAM_file

     -U: Single-end data

     -1/-2: Pair-end data

3. Alignment file format: SAM/BAM

to be continued

Part 2. Filtering

An important issue with ChIP-Seq data concerns the inclusion of multiple mapped reads (reads mapped to multiple loci on the reference genome). Allowing for multiple mapped reads increases the number of usable reads and sensitivity of peak detection; however, the number of false positives may also increase[1]. Therefore we need to filter out alignment files to contain only uniquely mapping reads in order to increase confidence in site discovery and improve reproducibility. Since there is no parameter in Bowtie2 to keep only uniquely mapping reads, we will need to perform the following steps to generate alignment files containing only the uniquely mapping reads:

1. Change alignment file format from SAM to BAM by samtools view

parameters included in this step:

-h: include header in output

-S: input is in SAM format

-b: output BAM format

-o: /path/to/output/file

2. Sort BAM file by read coordinate locations(sambamba sort or samtools sort)

the advantage to using sambamba is that along with the newly sorted file, an index file is generated. If we used samtools this would have been a two-step process.

3. Filter to keep only uniquely mapping reads(this will also remove any unmapped reads and duplicates)

We filter out multimappers by specifying XS:

XS:i:<N> Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i

or We can filter by MAPQ.

* for sambamba

-t: number of threads / cores

-h: print SAM header before reads

-f: format of output file (default is SAM)

-F: set custom filter - we will be using the filter to remove duplicates, multimappers and unmapped reads.

sambamba view -h -t 2 -f bam -F "[XS] == null and not unmapped  and not duplicate" sorted.bam>sort.filter.bam

*for samtools

samtools view -Shub -f 2 -q 30 $sam | samtools sort - -T $path/$sample -o $filter_bam

TO BE CONTINUED

   

相关文章

网友评论

      本文标题:hbctraining-Introduction to ChIP

      本文链接:https://www.haomeiwen.com/subject/vlmukqtx.html