美文网首页
组装植物叶绿体基因组

组装植物叶绿体基因组

作者: 多啦A梦的时光机_648d | 来源:发表于2023-05-18 19:50 被阅读0次

1.利用二代数据

1.1提取二代数据

如果你的数据太大,可以只用一部分,不大就用全部。

$seqkit split2 -p 4 dh_AQ054-01R0001_good_1.fq.gz  ##分成4份,每份差不多10g
$seqkit split2 -p 4 dh_AQ054-01R0001_good_2.fq.gz

1.2利用getorganelle组装

$get_organelle_from_reads.py -1 ./dh_AQ054-01R0001_good_1.fq.gz.split/dh_AQ054-01R0001_good_1.part_001.fq.gz -2 dh_AQ054-01R0001_good_2.fq.gz.split/dh_AQ054-01R0001_good_2.part_001.fq.gz -F embplant_mt -o output-mitor -R 10 -t 40 -k 21,45,65,85,105

1.3利用novoplasty组装

先配置一下config.log文件

Project:
-----------------------
Project name          = Test
Type                  = mito
Genome Range          = 200000-300000 
K-mer                 = 28
Max memory            =
Extended log          = 0
Save assembled reads  = no
Seed Input            = /yt/918_Meconopsis/Mitor/nad1.fa  ##线粒体上的随意一个基因
Extend seed directly  = no
Reference sequence    = /yt/918_Meconopsis/Mitor/NC_023103_mito_ref.fa ##可选项,参考基因组,有的话就写上
Variance detection    =
Chloroplast sequence  = /yt/918_Meconopsis/Mitor/2_NOVOPlasty/NC_041671.1_chlor.fa

Dataset 1:
-----------------------
Read Length           = 151
Insert size           = 300
Platform              = illumina
Single/Paired         = PE
Combined reads        =
Forward reads         = /yt/918_Meconopsis/Mitor/dh_AQ054-01R0001_good_1.fq.gz.split/dh_AQ054-01R0001_good_1.part_001.fq.gz  
Reverse reads         = /yt/918_Meconopsis/Mitor/dh_AQ054-01R0001_good_2.fq.gz.split/dh_AQ054-01R0001_good_2.part_001.fq.gz
perl ~/soft/NOVOPlasty-master/NOVOPlasty4.3.1.pl -c config.txt
结果fasta

可以看出只用二代数据不足以让线粒体成环


image.png

2 利用 二代和三代数据

/software/miniconda3/envs/yt/bin/unicycler -t 60 --racon_path /software/miniconda3/envs/yt/bin/racon  -1 /918_Meconopsis/Mitor/Papaver_somniferum/ERR5554575/ERR5554575_1.fastq.gz -2 /yt/918_Meconopsis/Mitor/Papaver_somniferum/ERR5554575/ERR5554575_2.fastq.gz  -l /yt/918_Meconopsis/Mitor/Papaver_somniferum/SRR10271167/SRR10271167.fastq.gz -o ./unicycler.out
##-1和-2:二代双端reads
##-l:三代数据

我们发现结果还是不成环。

3.用三代nextdenove组装

首先配置run.cfg文件,主要把genome_size改成线粒体大小,就是你的近缘种大小差不多,然后其他的你自己看着改。

[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = assemble # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 10 # number of tasks used to run in parallel
input_type = corrected # raw, corrected
read_type = clr # clr, ont, hifi
input_fofn = input.fofn
workdir = 9_18nextdenovo

[correct_option]
read_cutoff = 1k
genome_size = 4m # estimated genome size
seed_cutoff = 5k
sort_options = -m 150g -t 32
minimap2_options_raw = -x ava-pb -t 32
pa_correction = 5 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -p 15

[assemble_option]
minimap2_options_cns = -x ava-pb -t 30 -k17 -w17
nextgraph_options = -a 1

# see https://nextdenovo.readthedocs.io/en/latest/OPTION.html for a detailed introduction about all the parameters

第二个就是input.fofn,文件内容就是你的三代数据的路径

第三运行

/software/NextDenovo/nextDenovo ./run.cfg

相关文章

网友评论

      本文标题:组装植物叶绿体基因组

      本文链接:https://www.haomeiwen.com/subject/edkuortx.html