genome:
ChAR-seq1_R1.fq
ChAR-seq1_R2.fq
1.双端进行trimmomatic -> fq
trimmomatic PE -threads 20 -phred33 ChAR-seq1_R1.fq ChAR-seq1_R2.fq sample.R1.clean.fq sample.R1.unpaired.fq sample.R2.clean.fq sample.R2.unpaired.fq ILLUMINACLIP:/home/pc/miniconda3/share/trimmomatic-0.38-0/adapters/TruSeq3-PE.fa:2:30:10:8:true LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 AVGQUAL:20 MINLEN:50
[R2端转为R1]
seqkit seq sample.R2.clean.fq -r -p > sample.R2.fine.fq
2.单端进行split -> DNA and RNA fq
gzip sample.R1.clean.fq &&
gzip sample.R2.fine.fq &&
python /home/pc/biosoft/flypipe-master/char_bridge_trackall.py --FASTQGZ sample.R1.clean.fq.gz --NAME sample.R1 --minRNA 18 --minDNA 18 &&
python /home/pc/biosoft/flypipe-master/char_bridge_trackall.py --FASTQGZ sample.R2.fine.fq.gz --NAME sample.R1 --minRNA 18 --minDNA 18
3.DNA and RNA 分别mapping -> bam
gndx=/media/pc/disk1/sun/refdata/gencode_GRCh37/03_bowtie2_index/GRCh37
bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R1.dna.bridgePE.fastq.gz | samtools view -bS -o sample.R1.dna.bam &&
bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R1.rna.bridgePE.fastq.gz | samtools view -bS -o sample.R1.rna.bam &&
bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R2.dna.bridgePE.fastq.gz | samtools view -bS -o sample.R2.dna.bam &&
bowtie2 -p 20 -D 20 -R 3 -N 1 -L 16 -i S,1,0.50 -x $gndx -U sample.R2.rna.bridgePE.fastq.gz | samtools view -bS -o sample.R2.rna.bam
4.过滤掉MAPQ<20 -> filtered.bam
samtools view -b -q 20 sample.R1.dna.bam > sample.R1.dna.filtered.bam &&
samtools view -b -q 20 sample.R1.rna.bam > sample.R1.rna.filtered.bam &&
samtools view -b -q 20 sample.R2.dna.bam > sample.R2.dna.filtered.bam &&
samtools view -b -q 20 sample.R2.rna.bam > sample.R2.rna.filtered.bam
5.去重 -> clean.bam
java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R1.dna.filtered.bam O=sample.R1.dna.clean.bam M=picard.sample.R1.dna.txt &&
java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R1.rna.filtered.bam O=sample.R1.rna.clean.bam M=picard.sample.R1.rna.txt &&
java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R2.dna.filtered.bam O=sample.R2.dna.clean.bam M=picard.sample.R2.dna.txt &&
java -jar /home/pc/biosoft/picard.jar MarkDuplicates REMOVE_DUPLICATES=true I=sample.R2.rna.filtered.bam O=sample.R2.rna.clean.bam M=picard.sample.R2.rna.txt
6.提取mapped的sam -> mapped.sam
samtools view -F 4 sample.R1.dna.clean.bam > sample.R1.dna.clean.sam &&
samtools view -F 4 sample.R1.rna.clean.bam > sample.R1.rna.clean.sam &&
samtools view -F 4 sample.R2.dna.clean.bam > sample.R2.dna.clean.sam &&
samtools view -F 4 sample.R2.rna.clean.bam > sample.R2.rna.clean.sam
7.转为simple txt -> txt
awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R1.dna.clean.sam > sample.R1.dna.txt &&
awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R1.rna.clean.sam > sample.R1.rna.txt &&
awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R2.dna.clean.sam > sample.R2.dna.txt &&
awk '{if($2 == 0) sign="+"; else if ($2 == 16) sign="-"; else sign="."; print $1,$3,$4,$5,$6,length($10),sign}' sample.R2.rna.clean.sam > sample.R2.rna.txt
transcript:
1.RNA fq mapping -> bam
2.过滤掉MAPQ<20 -> filtered.bam
3.去重 -> clean.bam
4.提取mapped的sam -> sam
5.转为simple txt -> txt
网友评论