文章:Accurate reconstruction of bacterial pan- and core genomes with PEPPAN. Genome Res. 2020
引用:5
GITHUB: https://github.com/zheminzhou/PEPPAN
conda pip3安装
conda create -n peppan
conda activate peppan
# dependency
conda config --add channels defaults
conda config --add channels conda-forge
conda config --add channels bioconda
conda install mmseqs2
conda install blast diamond rapidnj fasttree
# main procedure
pip3 install peppan
# miniconda3/envs/peppan/lib/python3.7/site-packages
PEPPAN --help
PEPPAN_parser --help
Github下载peppan文件,结合以上conda安装的依赖也OK
# win下载master.zip文件:https://github.com/zheminzhou/PEPPAN/archive/refs/heads/master.zip
./PEPPAN --help
测试peppan
# peppan
cd /hutongyuan/software/PEPPAN-master/
PEPPAN -p examples/ST131 \
-P examples/GCF_000010485.combined.gff.gz examples/*.gff.gz
PEPPAN参数:
p: [Default: PEPPAN] prefix for the outputs
t: [Default: 8] Number of threads
测试peppan_parser (用peppan的PEPPAN.gff作为输入)
# peppan_parser
PEPPAN_parser -g examples/ST131.PEPPAN.gff \
-s examples/PEPPAN_out \
-t -c
PEPPAN_parser参数:
g: [REQUIRED] generated PEPPA.gff file from PEPPA.py
s: [optional] A folder for splitted GFF files
t: [Default: False] Flag to generate the gene present/absent tree
c: [Default: False] Flag to generate a rarefraction curve
a: [Default: -1] Set to an integer between 0 and 100, % of presence for a gene to be included in a Core Gene Allelic Variation tree
测试过程
# peppan
Run MMSeqs linclust to get exemplar sequences
Iterative clustering. 5995 exemplars left with identity = 0.9
Run BLASTn
Run diamond
Obtained 5987 exemplar gene sequences from examples/ST131.clust.exemplar
...
# peppan_parser
GFF files are saved under folder examples/PEPPAN_out
Summary of the pan-genome is saved in examples/ST131.PEPPAN.gene_content.summary_statistics.txt
Gene content matrix is saved in examples/ST131.PEPPAN.gene_content.csv
Gene presence matrix is saved in examples/ST131.PEPPAN.gene_content.Rtab
Gene content tree is saved in examples/ST131.PEPPAN.gene_content.nwk
Curves for all genes are saved in examples/ST131.PEPPAN.gene_content.curve
测试结果 (红标是parser的结果)
运行peppan - 案例
PEPPAN -t 8 \
-p result_peppan/peppan \
./gff/*.gff
PEPPAN_parser -g result_peppan/peppan.PEPPAN.gff \
-s result_peppan/PEPPAN_out \
-t -c
peppan结果
peppan.PEPPAN.gene_content.Rtab 即是PAV
Rtab表里有多拷贝现象
peppan.PEPPAN.gene_content.curve
网友评论