[随笔]minimap2 节约内存

作者: Silver_42ac | 来源:发表于2020-01-17 11:16 被阅读0次

minimap2 我是用10G 以上基因组 +100G reads

默认参数下
一般消耗20-40G 内存；
存入文件时消耗80G

后来思考，-I 参数，对于一些大基因组可以以消耗时间为代价，降低内存消耗

-I NUM  Load at most NUM target bases into RAM for indexing [4G]. If there are more than NUM bases in target.fa,
 minimap2 needs to read query.fa multiple times to map it against each batch of target sequences. 
NUM may be ending with k/K/m/M/g/G. NB: mapping quality is incorrect given a multi-part index.

Note：如果基因组大于 -I 设置的大小，就会是 multi-part index；
这时副作用
(1) 比对质量（mapping quality ）会不准确，根据需要进行取舍
(2) 使用 -a 参数，以 sam 格式输出,则不会有前面的SQ 行；

@SQ SN:C14E LN:145181

建议还在用sam 格式的同学转战 paf 格式吧，长度信息都在paf 中
PAF: a Pairwise mApping Format

Col Type    Description
1   string  Query sequence name
2   int Query sequence length
3   int Query start (0-based; BED-like; closed)
4   int Query end (0-based; BED-like; open)
5   char    Relative strand: "+" or "-"
6   string  Target sequence name
7   int Target sequence length
8   int Target start on original strand (0-based)
9   int Target end on original strand (0-based)
10  int Number of residue matches
11  int Alignment block length
12  int Mapping quality (0-255; 255 for missing)

默认-I 是4G ；也就是如果基因组过大，拆分为多份多次导入内存中比对；
以比对时间为代价降低内存消耗，建立索引时修改 -I 参数

minimap2 -I 3G -d  ref.mmi  ref.fasta

网友评论

R

本文标题：[随笔]minimap2 节约内存

本文链接：https://www.haomeiwen.com/subject/szdczctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

[随笔]minimap2 节约内存

@SQ SN:C14E LN:145181

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

R