1.利用二代数据
1.1提取二代数据
如果你的数据太大,可以只用一部分,不大就用全部。
$seqkit split2 -p 4 dh_AQ054-01R0001_good_1.fq.gz ##分成4份,每份差不多10g
$seqkit split2 -p 4 dh_AQ054-01R0001_good_2.fq.gz
1.2利用getorganelle组装
$get_organelle_from_reads.py -1 ./dh_AQ054-01R0001_good_1.fq.gz.split/dh_AQ054-01R0001_good_1.part_001.fq.gz -2 dh_AQ054-01R0001_good_2.fq.gz.split/dh_AQ054-01R0001_good_2.part_001.fq.gz -F embplant_mt -o output-mitor -R 10 -t 40 -k 21,45,65,85,105
1.3利用novoplasty组装
先配置一下config.log文件
Project:
-----------------------
Project name = Test
Type = mito
Genome Range = 200000-300000
K-mer = 28
Max memory =
Extended log = 0
Save assembled reads = no
Seed Input = /yt/918_Meconopsis/Mitor/nad1.fa ##线粒体上的随意一个基因
Extend seed directly = no
Reference sequence = /yt/918_Meconopsis/Mitor/NC_023103_mito_ref.fa ##可选项,参考基因组,有的话就写上
Variance detection =
Chloroplast sequence = /yt/918_Meconopsis/Mitor/2_NOVOPlasty/NC_041671.1_chlor.fa
Dataset 1:
-----------------------
Read Length = 151
Insert size = 300
Platform = illumina
Single/Paired = PE
Combined reads =
Forward reads = /yt/918_Meconopsis/Mitor/dh_AQ054-01R0001_good_1.fq.gz.split/dh_AQ054-01R0001_good_1.part_001.fq.gz
Reverse reads = /yt/918_Meconopsis/Mitor/dh_AQ054-01R0001_good_2.fq.gz.split/dh_AQ054-01R0001_good_2.part_001.fq.gz
perl ~/soft/NOVOPlasty-master/NOVOPlasty4.3.1.pl -c config.txt
![](https://img.haomeiwen.com/i14744215/3b6665cfaf34c2ef.png)
可以看出只用二代数据不足以让线粒体成环
![](https://img.haomeiwen.com/i14744215/23ee9e48e6fe4e89.png)
2 利用 二代和三代数据
/software/miniconda3/envs/yt/bin/unicycler -t 60 --racon_path /software/miniconda3/envs/yt/bin/racon -1 /918_Meconopsis/Mitor/Papaver_somniferum/ERR5554575/ERR5554575_1.fastq.gz -2 /yt/918_Meconopsis/Mitor/Papaver_somniferum/ERR5554575/ERR5554575_2.fastq.gz -l /yt/918_Meconopsis/Mitor/Papaver_somniferum/SRR10271167/SRR10271167.fastq.gz -o ./unicycler.out
##-1和-2:二代双端reads
##-l:三代数据
![](https://img.haomeiwen.com/i14744215/08aef7e378ad9753.png)
我们发现结果还是不成环。
3.用三代nextdenove组装
首先配置run.cfg文件,主要把genome_size改成线粒体大小,就是你的近缘种大小差不多,然后其他的你自己看着改。
[General]
job_type = local # local, slurm, sge, pbs, lsf
job_prefix = nextDenovo
task = assemble # all, correct, assemble
rewrite = yes # yes/no
deltmp = yes
parallel_jobs = 10 # number of tasks used to run in parallel
input_type = corrected # raw, corrected
read_type = clr # clr, ont, hifi
input_fofn = input.fofn
workdir = 9_18nextdenovo
[correct_option]
read_cutoff = 1k
genome_size = 4m # estimated genome size
seed_cutoff = 5k
sort_options = -m 150g -t 32
minimap2_options_raw = -x ava-pb -t 32
pa_correction = 5 # number of corrected tasks used to run in parallel, each corrected task requires ~TOTAL_INPUT_BASES/4 bytes of memory usage.
correction_options = -p 15
[assemble_option]
minimap2_options_cns = -x ava-pb -t 30 -k17 -w17
nextgraph_options = -a 1
# see https://nextdenovo.readthedocs.io/en/latest/OPTION.html for a detailed introduction about all the parameters
第二个就是input.fofn,文件内容就是你的三代数据的路径
第三运行
/software/NextDenovo/nextDenovo ./run.cfg
网友评论