美文网首页mtDNA
对组装完成的线粒体基因组进行纠错

对组装完成的线粒体基因组进行纠错

作者: 多啦A梦的时光机_648d | 来源:发表于2023-01-12 20:49 被阅读0次

1.三代数据纠错minimap2+racon(迭代三次)

##step1 :long reads polishing
long reads remapping -Iteration 1(minimap2)
conda install racon
minimap2 127kb.fasta  ../output.1061_FPAC22H004594_1A--1061_FPAC22H004594_1A.fastq.fastq.gz >127kb.paf ##127kb.fa 初步组装结果  ##../output.1061_FPAC22H004594_1A--1061_FPAC22H004594_1A.fastq.fastq.gz 三代测序数据
gzip 127kb.paf

long reads consensus call -Iteration 1 (racon)
racon -t 20 ../output.1061_FPAC22H004594_1A--1061_FPAC22H004594_1A.fastq.fastq.gz 127kb.paf.gz 127kb.fasta >127kb1.fasta

long read remapping -Ilteration 2(minimap2)
minimap2 127kb1.fasta  ../output.1061_FPAC22H004594_1A--1061_FPAC22H004594_1A.fastq.fastq.gz >127kb1.paf
gzip 127kb1.paf

long reads consensus call -Iteration 2(racon)
racon -t 20 ../output.1061_FPAC22H004594_1A--1061_FPAC22H004594_1A.fastq.fastq.gz 127kb1.paf.gz 127kb1.fasta >127kb2.fasta

long reads remapping -Iteration 3(minimap2)
minimap2 127kb2.fasta  ../output.1061_FPAC22H004594_1A--1061_FPAC22H004594_1A.fastq.fastq.gz >127kb2.paf
gzip 127kb2.paf

long reads consensus call -Iteration 3(racon)
racon -t 30 ../output.1061_FPAC22H004594_1A--1061_FPAC22H004594_1A.fastq.fastq.gz 127kb2.paf.gz 127kb2.fasta >127kb3.fasta

2.二代数据纠错bwa+pilon(一次)

## step2 :short reads polishing
conda install -c bioconda pilon
BWA genome indexing & short read remapping
bwa index 127kb3.fasta
bwa mem -t 30 127kb3.fasta ../WGS/Cassytha_filiformis_BDSW210000018-1A_1.clean.fq.gz ../WGS/Cassytha_filiformis_BDSW210000018-1A_2.clean.fq.gz |/home/lx_sky6/software/miniconda3/envs/yt/bin/samtools sort -m 10G -@ 20 >127kb3.bam  ##samtools=1.6,版本一定要高,不然这一行明令不行
samtools index 127kb3.bam
short read consensus call -Iteration 1 
/home/lx_sky6/software/miniconda3/bin/pilon --genome 127kb3.fasta --frags 127kb3.bam  --fix all  --output 127kb4

报错:pilon内存不足

Pilon version 1.24 Thu Jan 28 13:00:45 2021 -0500
Genome: 127kb3.fasta
Fixing snps, indels, gaps, local
Input genome size: 124420
Scanning BAMs
127kb3.bam: 158192970 reads, 0 filtered, 1947329 mapped, 1787406 proper, 11840 stray, FR 100% 371+/-82, max 615
Processing ctg000000:1-124420
frags 127kb3.bam: coverage 1760
Total Reads: 2087914, Coverage: 1760, minDepth: 176
Confirmed 102590 of 124420 bases (82.45%)
Corrected 822 snps; 1 ambiguous bases; corrected 176 small insertions totaling 452 bases, 233 small deletions totaling 962 bases
# Attempting to fix local continuity breaks
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at org.broadinstitute.pilon.PileUp.<init>(PileUp.scala:26)
    at org.broadinstitute.pilon.Assembler.$anonfun$addToPileups$1(Assembler.scala:83)
    at org.broadinstitute.pilon.Assembler$$Lambda$98/843467284.apply$mcVI$sp(Unknown Source)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
    at org.broadinstitute.pilon.Assembler.addToPileups(Assembler.scala:80)
    at org.broadinstitute.pilon.Assembler.addRead(Assembler.scala:65)
    at org.broadinstitute.pilon.Assembler.$anonfun$addReads$1(Assembler.scala:47)
    at org.broadinstitute.pilon.Assembler.$anonfun$addReads$1$adapted(Assembler.scala:47)
    at org.broadinstitute.pilon.Assembler$$Lambda$97/1176735295.apply(Unknown Source)
    at scala.collection.immutable.List.foreach(List.scala:333)
    at org.broadinstitute.pilon.Assembler.addReads(Assembler.scala:47)
    at org.broadinstitute.pilon.GapFiller.assembleIntoBreak(GapFiller.scala:127)
    at org.broadinstitute.pilon.GapFiller.assembleAcrossBreak(GapFiller.scala:55)
    at org.broadinstitute.pilon.GapFiller.fixBreak(GapFiller.scala:46)
    at org.broadinstitute.pilon.GenomeRegion.$anonfun$identifyAndFixIssues$6(GenomeRegion.scala:401)
    at org.broadinstitute.pilon.GenomeRegion.$anonfun$identifyAndFixIssues$6$adapted(GenomeRegion.scala:399)
    at org.broadinstitute.pilon.GenomeRegion$$Lambda$88/233021551.apply(Unknown Source)
    at scala.collection.immutable.List.foreach(List.scala:333)
    at org.broadinstitute.pilon.GenomeRegion.identifyAndFixIssues(GenomeRegion.scala:399)
    at org.broadinstitute.pilon.GenomeFile.$anonfun$processRegions$4(GenomeFile.scala:113)
    at org.broadinstitute.pilon.GenomeFile.$anonfun$processRegions$4$adapted(GenomeFile.scala:102)
    at org.broadinstitute.pilon.GenomeFile$$Lambda$43/1863932867.apply(Unknown Source)
    at scala.collection.immutable.List.foreach(List.scala:333)
    at org.broadinstitute.pilon.GenomeFile.processRegions(GenomeFile.scala:102)
    at org.broadinstitute.pilon.Pilon$.main(Pilon.scala:111)
    at org.broadinstitute.pilon.Pilon.main(Pilon.scala)

或者

Pilon version 1.24 Thu Jan 28 13:00:45 2021 -0500
Genome: YC.asm.hic.p_ctg.fasta
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at org.broadinstitute.pilon.GenomeRegion.<init>(GenomeRegion.scala:54)
    at org.broadinstitute.pilon.GenomeFile.$anonfun$contigRegions$1(GenomeFile.scala:73)
    at org.broadinstitute.pilon.GenomeFile.$anonfun$contigRegions$1$adapted(GenomeFile.scala:73)
    at org.broadinstitute.pilon.GenomeFile$$Lambda$25/731260860.apply(Unknown Source)
    at scala.collection.immutable.Range.map(Range.scala:59)
    at org.broadinstitute.pilon.GenomeFile.contigRegions(GenomeFile.scala:73)
    at org.broadinstitute.pilon.GenomeFile.$anonfun$regions$1(GenomeFile.scala:53)
    at org.broadinstitute.pilon.GenomeFile$$Lambda$24/1795799895.apply(Unknown Source)
    at scala.collection.immutable.List.map(List.scala:250)
    at org.broadinstitute.pilon.GenomeFile.<init>(GenomeFile.scala:53)
    at org.broadinstitute.pilon.Pilon$.main(Pilon.scala:108)
    at org.broadinstitute.pilon.Pilon.main(Pilon.scala)

调大软件设置的内存限制

#查询pilon路径
which pilon
#修改pilon配置
vim ~/software/miniconda3/bin/pilon
1g
10g

3. 检查、验证和装配质量评估

BWA genome indexing & short read remapping
$bwa index 127kb4.fasta
$bbwa mem -t 30 127kb4.fasta ../WGS/Cassytha_filiformis_BDSW210000018-1A_1.clean.fq.gz ../WGS/Cassytha_filiformis_BDSW210000018-1A_2.clean.fq.gz |/home/lx_sky6/software/miniconda3/envs/yt/bin/samtools sort -m 10G -@ 20 >127kb4.bam
(若未环:circlator mapreads 127kb4.fasta ../WGS/Cassytha_filiformis_BDSW210000018-1A_1.clean.fq.gz 127kb4.bam)

$samtools index 127kb4.bam
$samtools tview 127kb4.bam 127kb4.fasta

sequenceing depth
$samtools depth 127kb4.bam >127kb4_depth.txt

相关文章

  • 对组装完成的线粒体基因组进行纠错

    1.三代数据纠错minimap2+racon(迭代三次) 2.二代数据纠错bwa+pilon(一次) 报错:pil...

  • 线粒体组装软件(2) NOVOPlasty

    NOVOPlasty是一个小型环状基因组的软件,这意味着这个软件可以组装线粒体基因组和叶绿体基因组。这个软件是一个...

  • The alternative reality of plant

    植物线粒体DNA的另一种现实:一个环不能代表所有 摘要 植物线粒体基因组通常组装和展示成一个环形的图谱,这是广大生...

  • 02-Hi-C辅助基因组安装

    基因组组装 基因组是怎么组装的,目前的方法有什么局限性?为什么要进行基因组组装?是因为目前的测序方法,无论是一代、...

  • 线粒体组装

    1.下载线粒体参考序列 2.构建索引文件 参考Hisat2, Bowtie, Bowtie2和BWA构建基因组索引...

  • Bacteria genome denovo assembly

    细菌基因组组装金标准:GAGE-B 组装软件的选择 细菌基因组组装的目标不同于大型生物基因组的组装,大型基因组组装...

  • 线粒体组装软件(1)Mitofinder

    软件介绍 听名字就可以猜出来Mitofinder是一个专门用于组装线粒体基因组的软件,该软件运行于liunx平台,...

  • 细胞器基因组的在线基因注释网站

    在组装完成细胞器基因组之后,需要对基因进行注释。 推荐的网站有: Geseq:https://chlorobox....

  • NECAT组装ONT long reads

    NECAT 可用于ONT数据的纠错,组装,如果想对ONT long reads进行call SV,也可以使用nec...

  • 基因组质量评估 | LAI

    转载自 LAI: 评估基因组质量一个标准 基因组组装完成之后,就需要对最后的质量进行评估。我们希望得到的cont...

网友评论

    本文标题:对组装完成的线粒体基因组进行纠错

    本文链接:https://www.haomeiwen.com/subject/nhjycdtx.html