OrthoFinder寻找同源基因并建树（2）

作者: 多啦A梦的时光机_648d | 来源:发表于2019-09-23 19:40 被阅读0次

上一步我们已经获得了我们用来构树的同源基因集，这里就用这些基因构建系统发育树。
我们所需的单拷贝基因和对应的每个Orthogroups的具体信息在SingleCopyOrthogroups.txt和Orthogroups.tsv文件中。

orthofinder的结果

一：使用EasySpeciesTree脚本进行物种系统发育树的构建

该脚本依赖Mafft, TrimAI, RAxML和ASTRAL四个软件，需要自己提前安装好
修改脚本中相应依赖程序的绝对路径(你安装的路径)：vim EasySpeciesTree.py

############### MODIFY THE FOLLWINGS PATHS FOR ALL DEPENDENT PROGRAMS ###############
MAFFT = '/data1/spider/ytbiosoft/miniconda3/envs/python2/bin/mafft'
RAxML = '/data1/spider/ytbiosoft/miniconda3/envs/python2/bin/raxmlHPC'
ASTRAL = '/data1/spider/ytbiosoft/soft/ASTRAL-master/Astral/astral.5.6.3.jar'
TRIMAL = '/data1/spider/ytbiosoft/miniconda3/envs/python2/bin/trimal'
#####################################################################################

运行之前查看帮助文档

$python /data1/spider/ytbiosoft/soft/EasySpeciesTree-master/revised.py -h
usage: EasySpeciesTree [-h] -in1 INPUT1 -in2 INPUT2 -in3 INPUT3 -in4 INPUT4
                       [-t THREAD] [-nb BOOTSTRAP] [-m MODEL]


-------------------------------------------------------------------------------------------------------
EasySpeciesTree <SpeciesID prefix> <SingleCopyOrtho> <Orthogroups> <protein file> [thread] [bootstrap] [model]
Author: Wei Dong <1369852697@qq.com>, FAFU
Version: v1.0
Easily construct the ML species tree with all single-copy gene's protein sequences
-------------------------------------------------------------------------------------------------------


optional arguments:
  -h, --help            show this help message and exit
  -in1 INPUT1, --input1 INPUT1
                        offer the prefix of all abbreviated species id
  -in2 INPUT2, --input2 INPUT2
                        offer the Single-copy Orthogroups file, SingleCopyOrthogroups.txt
  -in3 INPUT3, --input3 INPUT3
                        offer the all Orthogroups file, Orthogroups.csv
  -in4 INPUT4, --input4 INPUT4
                        offer all species protein sequences
  -t THREAD, --thread THREAD
                        set the number of thread, default=10
  -nb BOOTSTRAP, --bootstrap BOOTSTRAP
                        set the number of bootstrap, default=100
  -m MODEL, --model MODEL
                        set the model of amino acid substitution, default=PROTGAMMAJTT

可以看到运行该脚本需要提供四个文件：所用物种基因名的缩写前缀文件，单拷贝基因文件SingleCopyOrthogroups.txt，所有物种的Orthogroups文件Orthogroups.csv，以及所有物种的蛋白序列合并后的文件all-pep.fas

1. input1 file (species_id.txt)

species_id

注意

这里的species_id要与你的fas文件名以及文件内的fas文件的开头都要保持一致，比如：

species_id

fas文件名

fas文件开头

2. input2 file (all-pep.fas)

 $cat *.fa >>all.pep.fas

3. input3 file(SingleCopyOrthogroups.txt)

4. input4 file(Orthogroups.csv)

二：运行脚本构建物种系统发育树

python /data1/spider/ytbiosoft/soft/EasySpeciesTree-master/revised.py -in1 species_id.txt -in2 SingleCopyOrthogroups.txt -in3 Orthogroups.csv -in4 all-pep.fas -nb 10 -t 16 &

这里为了演示快速生成结果加上-nb参数设定bootstrap值为10，-t参数设定线程数为16，默认bootstrap值为100，thread值为10，氨基酸替换模型选择默认的PROTGAMMAJTT模型。

程序运行结束后会在当前路径下生成四个文件夹：SingleGene，SingleGene_MSA，Concatenation，Coalescence，分别存放着所有单拷贝基因的序列，单拷贝基因比对后的序列，串联法建树的结果，并联法建树的结果。
详细运行过程可查看nohup.out文件中的日志信息。

1. Concatenataion文件夹中的RAxML_bipartitions.concatenation_out.nwk即为串联法最终生成的树文件

concatenation_out.nwk

2. Coalescence文件夹中的Astral.coalescence_tree.nwk即为并联法最终生成的树文件.

Astral.coalescence_tree.nwk

三.使用FigTree或MEGA进行可视化

将串联法和并联法生成的结果文件RAxML_bipartitions.concatenation_out.nwk，Astral.coalescence_tree.nwk导入FigTree中进行可视化

1. 并联法MEGA可视化结果

MEGA

2.

figtree

最后：

EasySpeciesTree程序下载链接：https://github.com/Davey1220/EasySpeciesTree.git

可用git clone https://github.com/Davey1220/EasySpeciesTree.git 直接下载使用。

网友评论

本文标题：OrthoFinder寻找同源基因并建树（2）

本文链接：https://www.haomeiwen.com/subject/kcemuctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！