Owing to the need for the domain sequences of proteins to build the phylogenetic tree, the following codes were written.
open FA, "$ARGV[0]";
$/=">";
<FA>;
while(<FA>){
chomp;
my($id,$seq)=(split /\n/,$_,2)[0,1];
$seq=~s/\n//g;
$hash{$id}=$seq;
#print">$id\n$seq\n";
}
$/="\n";
open IN, "$ARGV[1]";
while(<IN>){
chomp;
@temp=split /\t/,$_;
$length=$temp[2]-$temp[1]+1;
if(exists $hash{$temp[0]}){
$sequnce=substr($hash{$temp[0]},$temp[1]-1,$length);
print ">$temp[0]\n$sequnce\n";
}
}
input1 file
image.png
input2 file:
image.png
Running the code:
perl .\Domain_seq_extrac.pl .\ALl_combined_1.txt .\Domain_for_perl.txt
Results:
image.png
网友评论