美文网首页linux tools
Sambamba 去除重复工具

Sambamba 去除重复工具

作者: 上校的猫 | 来源:发表于2019-04-20 11:41 被阅读54次

    写在前面

    为什么会用这个工具呢
    因为我听说很快,并且被 samtools markdup 和 picard 伤到了。用 samtools markdup的时候提醒我要先 fixmate 并且 sort 按照 read name 来,可是我先前是按照默认的sort方式来的,emmm。gatk picard 去除重复后,比原先文件还大,加了什么鬼东西啊

    附上此工具链接

    http://lomereiter.github.io/sambamba/docs/sambamba-markdup.html

    开始

    gzip -d sambamba-0.6.8.gz
    chmod a+x sambamba-0.6.8
    
    ./sambamba-0.8.6
    

    下载解压,放进环境变量,就是如此简单,不需要安装。

    NAME

    sambamba-markdup - finding duplicate reads in BAM file

    SYNOPSIS

    sambamba markdup OPTIONS <input.bam> <output.bam>

    DESCRIPTION

    Marks (by default) or removes duplicate reads. For determining whether a read is a duplicate or not, the same criteria as in Picard are used.

    OPTIONS

    -r, --remove-duplicates
    remove duplicates instead of just marking them

    -t, --nthreads=NTHREADS
    number of threads to use

    -l, --compression-level=N
    specify compression level of the resulting file (from 0 to 9)");

    -p, --show-progress
    show progressbar in STDERR

    --tmpdir=TMPDIR
    specify directory for temporary files; default is /tmp

    --hash-table-size=HASHTABLESIZE
    size of hash table for finding read pairs (default is 262144 reads); will be rounded down to the nearest power of two; should be > (average coverage) * (insert size) for good performance

    --overflow-list-size=OVERFLOWLISTSIZE
    size of the overflow list where reads, thrown away from the hash table, get a second chance to meet their pairs (default is 200000 reads); increasing the size reduces the number of temporary files created

    --io-buffer-size=BUFFERSIZE
    controls sizes of two buffers of BUFFERSIZE megabytes each, used for reading and writing BAM during the second pass (default is 128)

    测试

    去重复特别快,3G的bam文件去重复时间只用了1min。

    相关文章

      网友评论

        本文标题:Sambamba 去除重复工具

        本文链接:https://www.haomeiwen.com/subject/mqyhgqtx.html