导读
BPGA是Bacterial Pan Genome Analysis tool的简写,16年发表的工具,17年最后一版更新,内置KEGG COG数据(老了),依赖usearch(32bit 可免费用),速度很快,其他一般,win linux均支持,可做参考。
文献:BPGA- an ultra-fast pan-genome analysis pipeline. sci rep 2016
引用:293
1 下载,解压,获取依赖usearch gnuplot,配置,启动
官网:https://iicb.res.in/bpga/index.html,下载,解压,BPGA
data:image/s3,"s3://crabby-images/50216/50216a998dcc0ad3ab516329d645fcf449e53a08" alt=""
data:image/s3,"s3://crabby-images/16893/16893121c1cf7a3121fb740a0c0899d3aea5ddc2" alt=""
usearch官网:http://www.drive5.com/usearch/download.html
下载,解压,重命名为usearch.exe,移动到BPGA bin文件夹,
data:image/s3,"s3://crabby-images/0bb4f/0bb4f9dea18c544de71b13609784e7a572cea277" alt=""
根据BPGA User Guide,下载安装gnuplot。
启动BPGA进行初始化,正常启动,
data:image/s3,"s3://crabby-images/c5eca/c5ecad6f4e4b3aa4b15144510d67766edaa77094" alt=""
2 泛基因组分析 -- 默认
准备【1】> 蛋白文件【4】> 选择文件 > 默认分析【2】> usearch聚类 > 50%一致性 > 等待。。。
data:image/s3,"s3://crabby-images/14b19/14b19c755020d54c261557cde6073c3aba86e785" alt=""
一大堆结果文件,然后,
data:image/s3,"s3://crabby-images/70200/70200fe76fa5a7cfc42e5e723671f539916d745f" alt=""
exclusively absent genes/proteins:
orthologous families that contain genes from all genomes except one specific genome
data:image/s3,"s3://crabby-images/e1c65/e1c6549b106aadb8bf1dfb24ef0fc459564a8553" alt=""
这里列出的是每个基因组的基因分类,全部加和是远高于泛基因组基因数的。Supporting_files/pan_default.txt给出了泛基因组基因数,如下。不仅如此Sequences中的代表性序列的加和也是泛基因组基因数。
data:image/s3,"s3://crabby-images/1a491/1a4914c5703db6b69aae83e057464ad6da1a3a49" alt=""
data:image/s3,"s3://crabby-images/07f8f/07f8facc584d05dfc7324546312e056c02972393" alt=""
泛基因组和核心基因组增长趋势:
data:image/s3,"s3://crabby-images/ee4da/ee4da2f033d9dd70f1506e16e0c6604162b34bd7" alt=""
各基因组基因家族数:
data:image/s3,"s3://crabby-images/a10c7/a10c79d7065caca4d4a1073be7f6a124557d06dc" alt=""
新基因数(与某一基因组相比???):
data:image/s3,"s3://crabby-images/1466e/1466ea0bcd7559aad7d576858ee7bc2b7ded9c31" alt=""
3 高级分析
data:image/s3,"s3://crabby-images/2bc5d/2bc5d0ed5ff327563a48ade74d9a247d16d3455a" alt=""
完成后一大堆结果,
泛基因组和核心基因组,又来???:
data:image/s3,"s3://crabby-images/10148/10148c9d832e7c0db1bf46bb89f193bd1327fbe6" alt=""
系统发生树 -- 泛基因组 & 核心基因组:
data:image/s3,"s3://crabby-images/af9eb/af9eb580238bddb7e1730ebdb4f8025ed3b50a7b" alt=""
KEGG注释分类:
data:image/s3,"s3://crabby-images/239fa/239fa63ace096fe6083eaeadd22fb1a9ce7cdb94" alt=""
COG注释分类:
data:image/s3,"s3://crabby-images/c6fe1/c6fe1187803cee6e04e2941861485f4f470f48cd" alt=""
实战:Linux中使用BPGA
获取Linux版BPGA,获取Linux版usearch到BPGA bin文件夹
启动
./BPGA-Version-1.3
基础pangenome分析:
1 INPUT PREPARATION FOR CLUSTERING
2 Use any Protein Fasta files
3 enter full path to the Directory where *.fasta
4 DEFAULT PAN GENOME ANALYSIS
5 Use USEARCH Clustring Algorithm (Ultra-fast)
6 Choose Sequence Identity Cut-off for Clustering: 0.8
data:image/s3,"s3://crabby-images/46031/46031ce83225605cb43c009400031e8772f2b1af" alt=""
其他过程同window版本,其实也就是输入文件指定略有不同,似乎如此。
data:image/s3,"s3://crabby-images/adcdc/adcdc966dd7f55306b5ece685796f22068f02d32" alt=""
节点132G内存,使用4G足以,大数据更加耗内存
高级分析 - 进化分析:
1 Neighbour Joining Tree (NJ):pan phylogeny
2 MLST based core phylogeny
3 Neighbour Joining Tree (NJ): core gene phylogeny
data:image/s3,"s3://crabby-images/ca9e2/ca9e2b0de667a0e865c84146ee0c29cc00b9d87a" alt=""
默认仅获得pan phylogenetic nwk,在此建树则有core phylogenetic nwk
结果整理
out="result"
mkdir $out
mv gi_name $out
mv INPUT_all.seq $out
mv list $out
mv Results $out
mv Sequences $out
mv Supporting_files $out
更多阅读:
BPGA - 一款泛基因组分析软件
网友评论