美文网首页生物信息学
ABySS (2019.3) - 短序列拼接

ABySS (2019.3) - 短序列拼接

作者: 吴十三和小可爱的札记 | 来源:发表于2019-11-05 19:54 被阅读0次

    1. 简介

    ABySS 是用于从头拼接双端测序短序列或较大基因组的软件,官网有测试数据可以使用。

    2. 下载安装

    ubantu下载安装。

    sudo apt-get install abyss
    

    3. 参数详解

    • abyss 将 short reads 组装成 contigs

    • abyss-pe 将 short reads 组装成 contigs 和 scaffolds

    • --chastity 去除污染的reads默认选项

    • --no-chastity 不去除污染的reads

    • --trim-masked 从序列某端去除低质量的碱基,默认选项

    • --no-trim-masked 不从序列末端去除低质量碱基

    • -q | --trim-quality=N 从序列尾端去除碱基质量低于此值的碱基

    • --standard-quality 碱基质量格式为 phred33 ,默认选项

    • --illumina-quality 碱基质量格式为 phred64

    • -o | --out=FILE 输出的 contigs 文件的文件名

    • -k | --kmer=N k-mer 长度,必要参数

    • -t | --trim-length=N maximum length of dangling edges to trim

    • -c | --coverage=FLOAT 去除 k-mer 覆盖读低于此值的 contigs

    • -b | --bubbles=N pop bubbles shorter than N bp [3*k]

    • -b0 | --no-bubbles do not pop bubbles

    • name 指定生成文件的前缀,运行结束后,会生成很多文件,主要关注test-contigs.fa,test-scaffolds.fa

    4. 示例

    使用 abyss-pe 将 1 个 paired-end 文库的short reads 组装成 scaffolds

    wget http://www.bcgsc.ca/platform/bioinfo/software/abyss/releases/1.3.4/test-data.tar.gz
    tar xzvf test-data.tar.gz
    abyss-pe \
    k=25 --out=test name=test \
     in='test-data/reads1.fastq test-data/reads2.fastq'
    

    Assembling multiple libraries

    abyss-pe \
    k=25 --out=test name=ecoli \
    lib='pea peb' \
     pea='pea_1.fa pea_2.fa' \
     peb='peb_1.fa peb_2.fa' \
     se='se1.fa se2.fa'
    

    使用多个 k 值进行基因组组装,再寻找最佳 k 值:

    export k
    for k in {20..40}; do
     mkdir k$k
     abyss-pe -C k$k name=ecoli in=../reads.fa
     done
    abyss-fac k*/ecoli-contigs.fa
    

    通过for 循环,实现多 梯度kmer组装:

    for k in `seq 50 8 90`; do
     mkdir k$k
     abyss-pe -C k$k name=test k=$k in=reads.fa
    done
    

    其他

    Scaffolding

    Scaffolding with linked reads

    Rescaffolding with long sequences

    Assembling using a Bloom filter de Bruijn graph

    Assembling using a paired de Bruijn graph

    Assembling a strand-specific RNA-Seq library

    Parallel processing

    Running ABySS on a cluster

    Using the DIDA alignment framework

    Tips

    abyss-pe是一种Makefile驱动程序脚本,因此任何的make选项均可与abyss-pe一起连用:

    • ABYSS: de Bruijn graph assembler

    • ABYSS-P: parallel (MPI) de Bruijn graph assembler

    • AdjList: find overlapping sequences

    • DistanceEst: estimate the distance between sequences

    • MergeContigs: merge sequences

    • MergePaths: merge overlapping paths

    • Overlap: find overlapping sequences using paired-end reads

    • PathConsensus: find a consensus sequence of ambiguous paths

    • PathOverlap: find overlapping paths

    • PopBubbles: remove bubbles from the sequence overlap graph

    • SimpleGraph: find paths through the overlap graph

    • abyss-fac: calculate assembly contiguity statistics

    • abyss-filtergraph: remove shim contigs from the overlap graph

    • abyss-fixmate: fill the paired-end fields of SAM alignments

    • abyss-map: map reads to a reference sequence

    • abyss-scaffold: scaffold contigs using distance estimates

    • abyss-todot: convert graph formats and merge graphs

    需要加载到PATH环境变量中,如配置ABYSS:

    export abyss_home=~/soft/abyss-...
    ​export PATH=$abyss_home/ABYSS:$PATH
    
    source ~/.bashrc
    

    相关文章

      网友评论

        本文标题:ABySS (2019.3) - 短序列拼接

        本文链接:https://www.haomeiwen.com/subject/lwvebctx.html