美文网首页
nanopore序列比对分析(以大肠杆菌序列示例)

nanopore序列比对分析(以大肠杆菌序列示例)

作者: 莫讠 | 来源:发表于2022-02-28 14:17 被阅读0次

    Sample:E.Coli BL21(DE3);The size of the genome is about 4.5 M Gram-negative Bacterial

    NCBI: https://www.ncbi.nlm.nih.gov/nuccore/CP001665.1

    DNA extraction kit:Easy Pure Bacterial Genomic DNA kit Code#EE161-01

    sequencing running time: 1.5 h

    Experimental details was shown in Lingfang’s report. When the sequencing step finished, We got the output data formated as fast5 file form the Nanopore Mk1C.

    1、Basecalling

    Because the raw date generated from the Nanopore sequencing device is formed as fast5, a kind of format used to record electrical signal, it is necessarily for us to transform this file to fastq file, which commonly used to record base information like ATCG and so on.

    • Bioinformatics Tool for basecalling

    Guppy software:Version 5.0.16+b9fcd7b5b

    • Usage:
    guppy_basecaller -i /home/qianwj/project/ONT/lab_data -c /opt/ont/guppy/data/dna_r9.4.1_450bps_sup.cfg -s /home/qianwj/project/ONT/basecalling_gpu_sup/ -x "cuda:0" > guppy_4_gpu_sup.log
    
    

    output:

    image

    2、Quality control

    usage:

    NanoPlot --fastq ~/project/ONT/basecalling_gpu_sup/pass/ -o ~/project/ONT/nanoplot/ -t8 --plots hex dot
    
    

    output:

    image

    Quality control reports:

    image

    mean read length :6,541.5
    mean read quality :14.5 (generally > 13)
    Read length N50 :11,141.0 (generally > 10k )

    • The distribution of read length and Average read quality


      newplot _2_.png
    • Accuracy

    According to the reports above, >Q10 is 100% means the base accuracy of all reads is higher than 90% and >Q12 =86.2 %, means about 86.2% reads’ accuracy rate higher than 93.69%

    3、Alignment

    Download the reference genome from NCBI:
    https://www.ncbi.nlm.nih.gov/nuccore/CP001665.1?report=genbank&to=4570938

    image
    • Using minimap2 to build index
    minimap2 -d  ~/project/ONT/reference/BL21DE3_genome.mmi  ~/project/ONT/reference/BL21DE3_genome.fasta
    
    
    • Performing the alignment process
    minimap2 -ax map-ont ~/project/ONT/reference/BL21DE3_genome.mmi  ~/project/ONT/basecalling_gpu_sup/pass/BL21.fastq.gz > alignment.sam
    
    
    • After alignmnet finished, using samtools to convert sam file to bam file and bulid index
    conda activate base
    samtools sort -@ 4 -O bam -o alignment.sorted.bam alignment.sam
    
    samtools index alignment.sorted.bam
    
    
    • Checking the alignmnet rate

    so according to the result, our alignment rate is 88.79%. Generally for pure microbial DNA genome, the alignment rate is more likely to higher than 90%, so possibly it may due to slightly contamination in the sample. Lingfang and I will check it later.

    屏幕截图 2021-11-30 141221.png
    • visualize by using IGV
    屏幕截图 2021-11-30 150230.png
    • coverage

    mean coverage = 50.8624X

    屏幕截图 2021-11-30 153219.png 屏幕截图 2021-11-30 155734.png
    • Mapping Quality Across Reference

    Mapping quality is the confidence that the read is correctly mapped to the genomic coordinates

    Mean Mapping Quality is 55.31 (generally > 30 is ok)

    屏幕截图 2021-11-30 155959.png

    相关文章

      网友评论

          本文标题:nanopore序列比对分析(以大肠杆菌序列示例)

          本文链接:https://www.haomeiwen.com/subject/abelrrtx.html