hbctraining-Introduction to ChIP

作者: horsefish | 来源:发表于2018-12-20 20:49 被阅读0次

hbctraining-Introduction to ChIP
hbctraining-Introduction to ChIP
hbctraining-Introduction to ChIP
hbctraining-Introduction to ChIP
Android Design Support Library v
Chip
ChIp-Seq,ATAC-seq, DAP-seq
ChIP-seq的实验设计补充
Chip-seq原理篇
ChIP-seq简介

Align and Filtering

Part 1. Align

1.Alignment to Genome

After we have assessed the clean sequence data, we are ready to align the reads to the reference genome. Bowtie2 is a fast and accurate alignment tools that indexes the genome with an FM index based on the Burrows-Wheeler transform method to keep memory requirements low for the alignment process. Bowtie2 supports gapped, local and paired end alignment modes and works best for reads that are at least 50bp (shorter read lengths should use Bowtie1, like smRNA-Seq). By default, Bowtie2 will perform a global end-to-end read alignment, which is best for quality-trimmed reads. However, it also has a local alignment mode, which will perform soft-clipping for the removal of poor quality bases or adapters from untrimmed reads.

2. Bowtie2 Usage

* Creating a Bowtie2 index

Genome index , analagous to the index in the back of a book,is required to perform the Bowtie2 alignment. We can generate the genome index by the following command:

bowtie2-build<path_to_reference_genome.fa><prefix_to_name_indexes>

* often-used parameters in Bowtie2

-p: number of processors/cores

-q: reads that are in FASTQ format

--local: local alignment feature to perform soft-clipping

-x: /path/to/genome_index_directory

-S: /path/to/output/SAM_file

-U: Single-end data

-1/-2: Pair-end data

3. Alignment file format: SAM/BAM

to be continued

Part 2. Filtering

An important issue with ChIP-Seq data concerns the inclusion of multiple mapped reads (reads mapped to multiple loci on the reference genome). Allowing for multiple mapped reads increases the number of usable reads and sensitivity of peak detection; however, the number of false positives may also increase[1]. Therefore we need to filter out alignment files to contain only uniquely mapping reads in order to increase confidence in site discovery and improve reproducibility. Since there is no parameter in Bowtie2 to keep only uniquely mapping reads, we will need to perform the following steps to generate alignment files containing only the uniquely mapping reads:

1. Change alignment file format from SAM to BAM by samtools view

parameters included in this step:

-h: include header in output

-S: input is in SAM format

-b: output BAM format

-o: /path/to/output/file

2. Sort BAM file by read coordinate locations(sambamba sort or samtools sort)

the advantage to using sambamba is that along with the newly sorted file, an index file is generated. If we used samtools this would have been a two-step process.

3. Filter to keep only uniquely mapping reads(this will also remove any unmapped reads and duplicates)

We filter out multimappers by specifying XS:

XS:i:<N> Alignment score for the best-scoring alignment found other than the alignment reported. Can be negative. Can be greater than 0 in --local mode (but not in --end-to-end mode). Only present if the SAM record is for an aligned read and more than one alignment was found for the read. Note that, when the read is part of a concordantly-aligned pair, this score could be greater than AS:i

or We can filter by MAPQ.

* for sambamba

-t: number of threads / cores

-h: print SAM header before reads

-f: format of output file (default is SAM)

-F: set custom filter - we will be using the filter to remove duplicates, multimappers and unmapped reads.

sambamba view -h -t 2 -f bam -F "[XS] == null and not unmapped and not duplicate" sorted.bam>sort.filter.bam

*for samtools

samtools view -Shub -f 2 -q 30 $sam | samtools sort - -T $path/$sample -o $filter_bam

TO BE CONTINUED

网友评论

本文标题：hbctraining-Introduction to ChIP

本文链接：https://www.haomeiwen.com/subject/vlmukqtx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！