美文网首页
PEPPAN分析泛基因组

PEPPAN分析泛基因组

作者: 胡童远 | 来源:发表于2021-10-08 09:24 被阅读0次

    文章:Accurate reconstruction of bacterial pan- and core genomes with PEPPAN. Genome Res. 2020
    引用:5
    GITHUB: https://github.com/zheminzhou/PEPPAN

    conda pip3安装

    conda create -n peppan
    conda activate peppan
    # dependency
    conda config --add channels defaults
    conda config --add channels conda-forge
    conda config --add channels bioconda
    conda install mmseqs2
    conda install blast diamond rapidnj fasttree
    
    # main procedure
    pip3 install peppan
    # miniconda3/envs/peppan/lib/python3.7/site-packages
    PEPPAN --help
    PEPPAN_parser --help
    

    Github下载peppan文件,结合以上conda安装的依赖也OK

    # win下载master.zip文件:https://github.com/zheminzhou/PEPPAN/archive/refs/heads/master.zip
    ./PEPPAN --help
    

    测试peppan

    # peppan
    cd /hutongyuan/software/PEPPAN-master/
    PEPPAN -p examples/ST131 \
    -P examples/GCF_000010485.combined.gff.gz examples/*.gff.gz
    

    PEPPAN参数:
    p: [Default: PEPPAN] prefix for the outputs
    t: [Default: 8] Number of threads

    测试peppan_parser (用peppan的PEPPAN.gff作为输入)

    # peppan_parser
    PEPPAN_parser -g examples/ST131.PEPPAN.gff \
    -s examples/PEPPAN_out \
    -t -c
    

    PEPPAN_parser参数:
    g: [REQUIRED] generated PEPPA.gff file from PEPPA.py
    s: [optional] A folder for splitted GFF files
    t: [Default: False] Flag to generate the gene present/absent tree
    c: [Default: False] Flag to generate a rarefraction curve
    a: [Default: -1] Set to an integer between 0 and 100, % of presence for a gene to be included in a Core Gene Allelic Variation tree

    测试过程

    # peppan
    Run MMSeqs linclust to get exemplar sequences
    Iterative clustering. 5995 exemplars left with identity = 0.9
    Run BLASTn
    Run diamond
    Obtained 5987 exemplar gene sequences from examples/ST131.clust.exemplar
    ...
    
    # peppan_parser
    GFF files are saved under folder examples/PEPPAN_out
    Summary of the pan-genome is saved in examples/ST131.PEPPAN.gene_content.summary_statistics.txt
    Gene content matrix is saved in examples/ST131.PEPPAN.gene_content.csv
    Gene presence matrix is saved in examples/ST131.PEPPAN.gene_content.Rtab
    Gene content tree is saved in examples/ST131.PEPPAN.gene_content.nwk
    Curves for all genes are saved in examples/ST131.PEPPAN.gene_content.curve
    

    测试结果 (红标是parser的结果)

    运行peppan - 案例

    PEPPAN -t 8 \
    -p result_peppan/peppan \
    ./gff/*.gff
    
    PEPPAN_parser -g result_peppan/peppan.PEPPAN.gff \
    -s result_peppan/PEPPAN_out \
    -t -c
    

    peppan结果

    peppan.PEPPAN.gene_content.Rtab 即是PAV

    Rtab表里有多拷贝现象

    peppan.PEPPAN.gene_content.curve

    相关文章

      网友评论

          本文标题:PEPPAN分析泛基因组

          本文链接:https://www.haomeiwen.com/subject/qeeznltx.html