背景
一行命令构建系统进化树。其实这个想法去年的时就开始构思了。当时在给师兄师姐帮忙构建几个基因家族的系统进化树,还是用着以前本科时候就学会的几个Windows下的建树软件MEGA/fasttree/RaxML。起先时一切都算顺利,给一个家族的基因序列,倒腾下就能出来结果,可以帮助到别人也是挺开心。但之后需要我反复的去给他们建树的时候就突然觉得我这是在做重复性的机器运动,而这构建系统进化树已经是一个流程化的操作了。既然如此,何不将其封装成一个pipeline,直接一键化操作呢?
这一年多的时间学Perl学shell学各种生信知识,终于在前些天对perl语言进阶之后觉得自己也可以实现一年前的这一个想法了。参考了下陈连福写perl程序的一些技巧,花了些空余时间终于完成此1.0版本的 phylogeny系统树的一行命令构建pipeline。
程序说明
- 使用环境:此perl程序需在Linux操作环境下。需提前安装三个依赖软件:MAFFT/Gblocks/IQtree,具体安装可利用conda进行一键化安装。转录组学习一(软件安装)
- 配置完毕之后即可将以下Perl脚本在服务器上,利用vim编辑器里插入模式i再F4进入粘贴模式,复制粘贴保存即可。
-
程序使用
perl getTree.pl
即可显示帮助信息,perl getTree.pl 输入文件.fasta 输出文件名称
,例如perl getTree.pl AP3.fasta ap3
- 输入原始多基因的fasta文件,会调用程序自动进行多序列比对,信息位点的筛选,核酸模型的选择,ML树的构建结果会生成以treefile结尾的树文件,可进一步利用figtree/Itol/ggtree进行可视化的操作。
tmp
结尾中间文件夹,包括各个程序中间文件及运行记录。 -
后续可以学习下如何在此pipeline中添加各个参数。比如输入序列类型,线程数等。
程序基本信息
运行过程
生成结果文件
#!/usr/bin/env perl
use strict;
use File::Copy;
my $usage = <<USAGE;
Usage:
perl $0 MultiSequences.fasta OutPrefix
For example:
perl $0 AP3.fasta ap3_tree
This Pipeline depends on the following software that can be run directly in terminal:
1. mafft (v7.310)
2. Gblocks (0.91b)
3. iqtree (1.55)
by Wang Tianpeng wangtp@ibcas.ac.cn
Version 1.0
USAGE
if (@ARGV==0){die $usage}
my($fasta,$out_prefix)=@ARGV;
my $pwd0=`pwd`;
chomp $pwd0;
# step0 Detecting the dependency softwares
## mafft
print STDERR "\nDetecting the dependency softwares\n\n";
my $software=`mafft --help 2>&1`;
if($software=~/MAFFT/){
print STDERR "MAFFT:\tOK\n";
}else{
die "Mafft\tfailed\n";
}
## Gblocks
my $sofware1=`Gblocks -a -b >111`;
open IN,"111" or die "$!";
<IN>;my $software_info=<IN>;
if($software_info=~/^\*/){
print STDERR "Gblocks:\tOK\n";
}else{
die "Gblocks\tfailed\n";
}
close IN;system("rm 111");
## iqtree
$software=`iqtree 2>&1`;
if ($software=~/IQ-TREE/){
print STDERR "IQtree:\tOK\n";
}else{
die "iqtree\tfailed\n";
}
# step1 create temporary directory;
#print "\n======================================\n";
mkdir "$out_prefix.tmp" unless -e "$out_prefix.tmp";
chdir "$out_prefix.tmp";
my $pwd1 = `pwd`;chomp($pwd1);# print STDERR "PWD:$pwd";
#open OUT,">","genome.fasta" or die "cant create file genome.fasta,$!";
# step2 MultiSequences Alignment with mafft;
print "\n=====================================================\n";
print "STEP1 MultiSequences Alignment with mafft\n\n";
mkdir "1.Mafft" unless -e "1.Mafft";
unless (-e "1.Mafft.ok"){
chdir "1.Mafft";
my $pwd=`pwd`;print STDERR "PWD:$pwd";
my $command="mafft --auto $pwd0\/$fasta >$out_prefix.aln.fa 2>mafft.log";
print STDERR (localtime).": $command\n";
system($command)==0 or die "can not execute :$command\n";
my $pwd_mafft=`pwd`;chomp($pwd_mafft);
chdir "..";
open OUT,">1.Mafft.ok" or die "$!";close OUT;
}else{
print STDERR "CMD(skipped): mafft \n";
}
my $pwd_mafft="$pwd1\/1.Mafft";#print "$pwd_mafft\n";
# step3 informative alignment points filter with Gblocks
#my $pwd =`pwd`;print "$pwd\n";
print "\n===================================================\n";
print "STEP2 selection of informative blocks with Gblocks\n\n";
mkdir "2.Gblocks" unless -e "2.Gblocks";
unless (-e "2.Gblocks.ok"){
chdir "2.Gblocks";
my $pwd=`pwd`;print STDERR "PWD: $pwd\n";
my $command="Gblocks $pwd_mafft\/$out_prefix.aln.fa -t=d -b4=5 -b5=a >Gblocks.log";
print STDERR (localtime).": $command\n";
system($command) or die "can not execute : $command\n";
my $gb_files="$pwd_mafft"."\/$out_prefix.aln.fa-gb\*"; #my $command_move="mv $test .";print "$command_move\n";
system("mv $gb_files \.")==0 or die "cant move file\n";
chdir "..";
open OUT,">2.Gblocks.ok" or die "$!";close OUT;
}else{
print STDERR "CMD(skipped):mafft \n";
}
my $pwd_gblock="$pwd1\/2.Gblocks";
# step4 Construct phylogeny tree with IQtree
print "\n====================================================\n";
print "STEP3 Construction of phylogeny tree with iqtree\n\n";
mkdir "3.IQtree" unless -e "3.IQtree";
unless (-e "3.IQtree.ok"){
chdir "3.IQtree";
my $pwd_iqtree=`pwd`;print STDERR "PWD: $pwd_iqtree\n";
my $command="iqtree -s $pwd_gblock\/$out_prefix.aln.fa-gb -bb 10000 -pre $out_prefix -nt 6 >iqtree.log";
print STDERR (localtime).": $command\n";
system($command)==0 or die "can not execute: $command\n";
system("cp $out_prefix.treefile ../..")==0 or die "can not copy file\n";
}
print STDERR "ALL commands complete! :-)\n";
网友评论