一、测序数据处理
1、bcl2fastq: converting the per-cycle base calls (stored in BCL files) into fastq format. (v. 1.8.4 for older sequencers, v. 2.18.x for newer sequencers),can be obtained from Illumina's portal (iCom).
2、质量评估可视化——FastQC
fastqc illumina.fq
3、质量评估QC tools:Trimmomatic, BBDuk, flexbar and cutadapt
Trimmomatic单端SE:trimmomatic SE SRR1553607_2.fastq trimmed_2.fq SLIDINGWINDOW:4:30
BBDuk单端:bbduk.sh in=SRR1553607_2.fastq out=bbduk.fq qtrim=r overwrite=true qtrim=30
Trimmomatic双端PE:trimmomatic PE SRR1553607_1.fastq SRR1553607_2.fastq trimmed_1.fq unpaired_1.fq trimmed_2.fq unpaired_2.fq SLIDINGWINDOW:4:30
BBDuk双端:bbduk.sh in1=SRR1553607_1.fastq in2=SRR1553607_2.fastq outm1=bbduk_1.fq out1=unpaired_bb_1.fq outm2=bbduk_2.fq out2=unpaired_bb_2.fq qtrim=r overwrite=true qtrim=30
4、trim adapters
Illumina Universal Adapter:AGATCGGAAGAG
Illumina Small RNA 3' Adapter:TGGAATTCTCGG
Illumina Small RNA 5'Adapter:GATCGTCGGACT
Nextera Transposase Sequence:CTGTCTCTTATA
SOLID Small RNA Adapter:CGCCTTGGCCGT
5、sequence duplication:picard MarkDuplicates
6、multiqc 结合多个样本的qc文件
使用conda 安装multiqc后升级,pip install --upgrade multiqc
multiqc illumina_fastqc iontorrent_fastqc
如果multiqc 遇到问题,可安装低版本的networkx:source activate multiqc;conda install networkx=1.11
7、merge
通常要求双端的reads length和要小于测序片段,FLASH (Fast Length Adjustment of SHort reads)可以merge 双端的reads length和大于测序片段。
使用flash和bbtools进行merge比较
安装flash:conda install flash -y
8、AfterQC:需要安装在python2.7的环境下
9、error correction
without knowing the reference genome by computing the so-called k-mer density of the data.The bbmap package includes the tadpole.sh error corrector:
tadpole.sh in=SRR519926_1.fastq out=tadpole.fq mode=correct out=r1.fq out2=r2.fq overwrite=true
此外还可以用bfc:bfc SRR519926_1.fastq > bfc.fq
网友评论