该软件来自picbio官网
准备文件
mkdir -p /home/train/06.reads_aligment/blasr
cd /home/train/06.reads_aligment/blasr
ln -s ~/00.incipient_data/data_for_genome_assembling/assemblies_of_Malassezia_sympodialis/Malassezia_sympodialis.genome_V01.fasta genome.fasta
ln -s ~/04.genome_assembling/Canu/Malassezia_sympodialis/subreads.fasta ./
PATH=/opt/biosoft/miniconda3_for_pbbioconda/bin/:$PATH
开始运行
blasr subreads.fasta genome.fasta --header -m 5 --out blasr.out5 --minPctAccuracy 70 --nproc 8 --stride 10
[train@MiWiFi-R3P-srv blasr]$ head -n 2 blasr.out5
qName qLength qStart qEnd qStrand tName tLength tStart tEnd tStrand score numMatch numMismatch numIns numDel mapQV qAlignedSeq matchPattern tAlignedSeq
m54086_170204_081430/5112079/3551_4400/0_849 849 321 623 + MS01Contig08 440595 301851 302159 + -878 252 37 13 19 254 TACCCGTTCGAGAA-GG-CCGCTGAGCCCTCGCTTCCGTGGGGAGCAATGCGCTGGCGCCGGTACCC-ATCCGG-GAGGAGCGTTGCATTGCCTGCAAGCT-CGCG-GGCCATCTGCCC--CGCCCAGGCCATCACCATCGAG-GCTGAG-CCCAAAGAGCTGATGGCAGC-CGCCGAACCACCCGCTAATGACATCGACATGAGACAAGTGCATCTACTGCGGCTTCTTCCA-G-GTC-TGTCCCGTGGATGCCATCGTCAG-GCCCCA-ACG-TTGAGTTCTCCACGGAGACCCATGA-GAGC-GCGGTACAACAAGGA ||||||||||||||*||*|||||||||||*||||||||*||*||||*|||||||*||||||*|||||*|*|*||*||||||||*|||||*||*||||||||**|||*||||||||||||**|||*|**||||||||*||||||*||*|||*|||***|*||*||*|||*||*||*||*||*|||||||*|*||*||||||||||**|||||||||||||||*|||*|*|*|||*|*|*|*||*|||||||||||*||||||*|*|*|*||*|||*|*||||||*|*||*|||||||**||*||||**|*|||||||||||| TACCCGTTCGAGAAGGGCCCGCTGAGCCCCCGCTTCCGCGGCGAGC-ATGCGCT-GCGCCGCTACCCGA-CGGGCGAGGAGCGCTGCATCGCGTGCAAGCTGTGCGAGGCCATCTGCCCGGCGCTC--GCCATCACGATCGAGAGC-GAGACCC---GCGCGGACGGC-GCGCGGCGCACGACCCGCT-ACGATATCGACATGA-CCAAGTGCATCTACTGTGGCATGTGCCAGGAGGCGTGCCCCGTGGATGCGATCGTC-GAGACGCAGACGCTCGAGTTCGCTACCGAGACCCGCGAGGAGCTTCTGTACAACAAGGA
[train@MiWiFi-R3P-srv blasr]$
统计有多少条序列能比对上去
[train@MiWiFi-R3P-srv blasr]$ cut -d " " -f 1 blasr.out5 | uniq | wc -l
5629
mapping rate = 5629/6000
统计每行的错误率
需补充
统计整体的错误率
[train@MiWiFi-R3P-srv blasr]$ ls
blasr.out5 genome.fasta subreads.fasta
[train@MiWiFi-R3P-srv blasr]$ perl -e '<>; while (<>) { @_ = split /\s+/; $length = $_[3] - $_[2]; $error_ratio = 1 - $_[11] / $length; push @er,$error_ratio if $length >= 1000; } @er = sort {$a <=> $b} @er; print "$er[@er/2]\n";' blasr.out5
0.0902698593690612
小技巧:截取结果文件的碱基序列
需补充
小技巧:截取subreads的1000条序列
[train@MiWiFi-R3P-srv blasr]$ perl -e 'while (<>) { if (m/^>/) { $num ++; last if $num > 1000; } print }' subreads.fasta | grep ">" | wc -l
1000
网友评论