perl /02_Cluster_stat_v1.1/bin/step3_Cluster_stat_family.pl category.txt all.cds cluster.stat-info --cluster_file all_orthomcl.out --type orthomcl --step 134 -q x.q
perl /07_orthomcl_pipeline_v1.0/bin/obtain_4d_phase1.pl all.philip
## vi step3_Cluster_stat_family.pl
this script is used for stat infomation form the result of orthomcl or treefam.
1.stat cluster infomation from cluster file .
File require :cluster_file category.txt.new cluster_stat_out;
Output : cluster_stat_out;
2.stat the cluster family from the cluster_stat_out, and draw veen_svg.
File require : category.txt.new all.cds cluster_stat_out;
Output : 4spec_veen.input;
3.stat the genefamilies information, such as of_gene,unique_family,single_gene.
File require : category.txt.new all.cds cluster_stat_out;
Output : family.stat.table;
4.filter the single_copy family from the orthomcl.out,and put the correspond cds together into the genefamily category.
then translate it to pep,run muscle.
and abstract all.philip from singlecopy genefamily
File require : cluster_file all.cds category.txt.new;
Output : ./singlecopy_genefamily/ ;
抽出的单拷贝同源基因家族只是用来建了树;流程得到的all.philip等所有philip文件均是来源于单拷贝同源基因家族,后面建树也就是基于这些文件,即全部都是单拷贝的。
基因家族的聚类文件cluster.stat-info包含所有家族的拷贝数,每个id即是一个基因家族.
$ tail -n 1 cluster.stat-info
26006 2 0 2 0 0 0 0 0 0 0 0 0 0 0 1
这个案例中得到的基因家族数目是26006个
网友评论