wtdbg2

作者: tobebettergirl | 来源:发表于2019-07-03 10:43 被阅读0次

    学习网址
    https://github.com/ruanjue/wtdbg2

    Wtdbg2 is a de novo sequence assembler for long noisy reads produced by PacBio or Oxford Nanopore Technologies (ONT).

    使用于三代数据的长片段组装

    It assembles raw reads without error correction and then builds the consensus from intermediate assembly output.

    不会产生错误的组装

    Wtdbg2 is able to assemble the human and even the 32Gb Axolotl genome at a speed tens of times faster than CANU and FALCON while producing contigs of comparable base accuracy.

    相比较canu 和 falcon , 速度较快。

    下载命令
    git clone https://github.com/ruanjue/wtdbg2
    cd wtdbg2
    make
    
    数据下载
    wget -t 200 http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz
    
    后续学习
    git clone https://github.com/ruanjue/wtdbg2
    cd wtdbg2 && make
    #quick start with wtdbg2.pl
    ./wtdbg2.pl -t 16 -x rs -g 4.6m -o dbg reads.fa.gz
    # Step by step commandlines
    # assemble long reads
    ./wtdbg2 -x rs -g 4.6m -i reads.fa.gz -t 16 -fo dbg
    
    # derive consensus
    ./wtpoa-cns -t 16 -i dbg.ctg.lay.gz -fo dbg.raw.fa
    
    # polish consensus, not necessary if you want to polish the assemblies using other tools
    minimap2 -t16 -ax map-pb -r2k dbg.raw.fa reads.fa.gz | samtools sort -@4 >dbg.bam
    samtools view -F0x900 dbg.bam | ./wtpoa-cns -t 16 -d dbg.raw.fa -i - -fo dbg.cns.fa
    
    # Addtional polishment using short reads
    bwa mem -t 16 dbg.cns.fa sr.1.fa sr.2.fa | samtools sort -O SAM | ./wtpoa-cns -t 16 -x sam-sr -d dbg.cns.fa -i - -fo dbg.srp.fa
    
    

    -g is the estimated genome size ;
    -x specifies the sequencing technology;

    • "rs" for PacBio RSII,
    • "sq" for PacBio Sequel,
    • "ccs" for PacBio CCS reads
    • "ont" for Oxford Nanopore.
      This option sets multiple parameters and should be applied before other parameters.
    image.png
    文献:https://www.biorxiv.org/content/biorxiv/early/2019/01/26/530972.full.pdf
    局限

    For Nanopore data, wtdbg2 may produce an assembly smaller than the true genome.

    组装的结果,比实际的基因组要小

    When inputing multiple files of both fasta and fastq format, please put fastq first, then fasta. Otherwise, program cannot find '>' in fastq, and append all fastq in one read.

    输入文件的格式

    相关文章

      网友评论

        本文标题:wtdbg2

        本文链接:https://www.haomeiwen.com/subject/apzfhctx.html