charseq质控

作者: 苏牧传媒 | 来源:发表于2018-11-13 21:43 被阅读4次

    genome:

    ChAR-seq1_R1.fq

    ChAR-seq1_R2.fq

    1.双端进行trimmomatic -> fq

    trimmomatic PE -threads 20 -phred33 ChAR-seq1_R1.fq ChAR-seq1_R2.fq sample.R1.clean.fq sample.R1.unpaired.fq sample.R2.clean.fq sample.R2.unpaired.fq ILLUMINACLIP:/home/pc/miniconda3/share/trimmomatic-0.38-0/adapters/TruSeq3-PE.fa:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 AVGQUAL:20 MINLEN:50

    [R2端转为R1]

    seqkit seq sample.R2.clean.fq -r -p > sample.R2.fine.fq

    2.单端进行split -> DNA and RNA fq

    gzip sample.R1.clean.fq &&

    gzip sample.R2.fine.fq &&

    python /home/pc/biosoft/flypipe-master/char_bridge_trackall.py --FASTQGZ sample.R1.clean.fq.gz   --NAME sample.R1 --minRNA 18 --minDNA 18 &&

    python /home/pc/biosoft/flypipe-master/char_bridge_trackall.py --FASTQGZ sample.R2.fine.fq.gz --NAME sample.R1 --minRNA 18 --minDNA 18

    3.DNA and RNA 分别mapping -> bam 

    gndx=/media/pc/disk1/sun/refdata/gencode_GRCh37/03_bowtie2_index/GRCh37

    bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R1.dna.bridgePE.fastq.gz | samtools view -bS -o sample.R1.dna.bam &&

    bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R1.rna.bridgePE.fastq.gz | samtools view -bS -o sample.R1.rna.bam &&

    bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R2.dna.bridgePE.fastq.gz | samtools view -bS -o sample.R2.dna.bam &&

    bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R2.rna.bridgePE.fastq.gz | samtools view -bS -o sample.R2.rna.bam

    4.过滤掉MAPQ<20 -> filtered.bam

    samtools view -b -q 20 sample.R1.dna.bam > sample.R1.dna.filtered.bam &&

    samtools view -b -q 20 sample.R1.rna.bam > sample.R1.rna.filtered.bam &&

    samtools view -b -q 20 sample.R2.dna.bam > sample.R2.dna.filtered.bam &&

    samtools view -b -q 20 sample.R2.rna.bam > sample.R2.rna.filtered.bam

    5.去重 -> clean.bam

    java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R1.dna.filtered.bam O=sample.R1.dna.clean.bam M=picard.sample.R1.dna.txt &&

    java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R1.rna.filtered.bam O=sample.R1.rna.clean.bam M=picard.sample.R1.rna.txt &&

    java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R2.dna.filtered.bam O=sample.R2.dna.clean.bam M=picard.sample.R2.dna.txt &&

    java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R2.rna.filtered.bam O=sample.R2.rna.clean.bam M=picard.sample.R2.rna.txt 

    6.提取mapped的sam -> mapped.sam

    samtools view -F 4 sample.R1.dna.clean.bam > sample.R1.dna.clean.sam &&

    samtools view -F 4 sample.R1.rna.clean.bam > sample.R1.rna.clean.sam &&

    samtools view -F 4 sample.R2.dna.clean.bam > sample.R2.dna.clean.sam &&

    samtools view -F 4 sample.R2.rna.clean.bam > sample.R2.rna.clean.sam

    7.转为simple txt -> txt

    awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R1.dna.clean.sam > sample.R1.dna.txt &&

    awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R1.rna.clean.sam > sample.R1.rna.txt &&

    awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R2.dna.clean.sam > sample.R2.dna.txt &&

    awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R2.rna.clean.sam > sample.R2.rna.txt

    transcript:

    1.RNA fq mapping -> bam

    2.过滤掉MAPQ<20 -> filtered.bam

    3.去重 -> clean.bam

    4.提取mapped的sam -> sam

    5.转为simple txt -> txt

    相关文章

      网友评论

        本文标题:charseq质控

        本文链接:https://www.haomeiwen.com/subject/pqdrfqtx.html