美文网首页
nanopore序列比对分析(以大肠杆菌序列示例)

nanopore序列比对分析(以大肠杆菌序列示例)

作者: 莫讠 | 来源:发表于2022-02-28 14:17 被阅读0次

Sample:E.Coli BL21(DE3);The size of the genome is about 4.5 M Gram-negative Bacterial

NCBI: https://www.ncbi.nlm.nih.gov/nuccore/CP001665.1

DNA extraction kit:Easy Pure Bacterial Genomic DNA kit Code#EE161-01

sequencing running time: 1.5 h

Experimental details was shown in Lingfang’s report. When the sequencing step finished, We got the output data formated as fast5 file form the Nanopore Mk1C.

1、Basecalling

Because the raw date generated from the Nanopore sequencing device is formed as fast5, a kind of format used to record electrical signal, it is necessarily for us to transform this file to fastq file, which commonly used to record base information like ATCG and so on.

  • Bioinformatics Tool for basecalling

Guppy software:Version 5.0.16+b9fcd7b5b

  • Usage:
guppy_basecaller -i /home/qianwj/project/ONT/lab_data -c /opt/ont/guppy/data/dna_r9.4.1_450bps_sup.cfg -s /home/qianwj/project/ONT/basecalling_gpu_sup/ -x "cuda:0" > guppy_4_gpu_sup.log

output:

image

2、Quality control

usage:

NanoPlot --fastq ~/project/ONT/basecalling_gpu_sup/pass/ -o ~/project/ONT/nanoplot/ -t8 --plots hex dot

output:

image

Quality control reports:

image

mean read length :6,541.5
mean read quality :14.5 (generally > 13)
Read length N50 :11,141.0 (generally > 10k )

  • The distribution of read length and Average read quality


    newplot _2_.png
  • Accuracy

According to the reports above, >Q10 is 100% means the base accuracy of all reads is higher than 90% and >Q12 =86.2 %, means about 86.2% reads’ accuracy rate higher than 93.69%

3、Alignment

Download the reference genome from NCBI:
https://www.ncbi.nlm.nih.gov/nuccore/CP001665.1?report=genbank&to=4570938

image
  • Using minimap2 to build index
minimap2 -d  ~/project/ONT/reference/BL21DE3_genome.mmi  ~/project/ONT/reference/BL21DE3_genome.fasta

  • Performing the alignment process
minimap2 -ax map-ont ~/project/ONT/reference/BL21DE3_genome.mmi  ~/project/ONT/basecalling_gpu_sup/pass/BL21.fastq.gz > alignment.sam

  • After alignmnet finished, using samtools to convert sam file to bam file and bulid index
conda activate base
samtools sort -@ 4 -O bam -o alignment.sorted.bam alignment.sam

samtools index alignment.sorted.bam

  • Checking the alignmnet rate

so according to the result, our alignment rate is 88.79%. Generally for pure microbial DNA genome, the alignment rate is more likely to higher than 90%, so possibly it may due to slightly contamination in the sample. Lingfang and I will check it later.

屏幕截图 2021-11-30 141221.png
  • visualize by using IGV
屏幕截图 2021-11-30 150230.png
  • coverage

mean coverage = 50.8624X

屏幕截图 2021-11-30 153219.png 屏幕截图 2021-11-30 155734.png
  • Mapping Quality Across Reference

Mapping quality is the confidence that the read is correctly mapped to the genomic coordinates

Mean Mapping Quality is 55.31 (generally > 30 is ok)

屏幕截图 2021-11-30 155959.png

相关文章

网友评论

      本文标题:nanopore序列比对分析(以大肠杆菌序列示例)

      本文链接:https://www.haomeiwen.com/subject/abelrrtx.html