物种进化树构建

作者: 生信小白2018 | 来源:发表于2019-09-26 11:06 被阅读0次

使用单拷贝基因画物种进化树

所需软件：

# OrthoFinder寻找同源基因
conda install  orthofinder

#EasySpeciesTree
git clone https://github.com/Davey1220/EasySpeciesTree.git

该脚本依赖Mafft, TrimAI, RAxML和ASTRAL四个软件，需要自己提前安装好
修改脚本中相应依赖程序的绝对路径：vi EasySpeciesTree.py
conda 可以直接安装mafft、 trimal、raxmlHPC
astral.5.6.3.jar：github下载：https://github.com/smirarab/ASTRAL
############### MODIFY THE FOLLWINGS PATHS FOR ALL DEPENDENT PROGRAMS ###############
MAFFT = '/xxx/anaconda2/bin/mafft'
RAxML = '/xxx/anaconda2/bin/raxmlHPC'
ASTRAL = '/xxx/software/ASTRAL-master/astral.5.6.3.jar'
TRIMAL = '/xxx/anaconda2/bin/trimal'
#####################################################################################

步骤：

python 脚本更改物种蛋白序列的ID

#!/usr/bin/evn python3
# -*- coding: utf-8 -*-
__author__='LJS'

import sys, os

def renameID(infile, outfile):
        ini = open(infile, 'r')
        out = open(outfile, 'w')
        j = 0
        for i in ini:
                i = i.strip()
                if i.startswith('>'):
#                       print('>seq'+str(j).rjust(5, '0'), file=out)  #for python3
                        print >>out, '>Amborella'+str(j).rjust(5, '0')  #for python2
                        j += 1
                else:
#                       print(i, file=out)              #for python3
                        print >>out, i                  #for python2    

if __name__ == '__main__':
        infile =sys.argv[1]
        outfile = sys.argv[2]
        renameID(infile, outfile)

2)更改各个物种蛋白库文件的名字，与序列ID的名字前半部分一致；
例如Amborella.fa 其序列ID的名字为

Amborella00000
......
Amborella26845

3）运行orthofinder

orthofinder -f orthsp1 -M msa -S diamond -t 16 -a 16


-f 指定蛋白序列所在的文件夹（orthosp1）此命令仅需指定输入文件夹，其余为默认参数。
-t 序列搜索使用的线程数 （默认值为16）
-a 序列分析使用的线程数 （默认值为1）
-M 基因树推断方法（默认为dendroblast）可选：dendroblast ，msa
-S 序列比对使用的程序 （默认为Blast）可选：blast, mmseqs, blast_gz, diamond（推荐使用diamond，比对速度快）
-A 多序列联配方式，该选项仅当 -M msa 选项时才有效（默认为mafft）可选：muscle, mafft
-T 建树方式，该选项仅当 -M msa 选项时才有效 （默认为fasttree）可选：iqtree, raxml-ng, fasttree, raxml
-s 输入特定的根物种树
-I 设定MCL的膨胀系数（默认为1.5）


#如果要修改建树过程中的bootstrap，就要修改程序的config.json文件。该文件在conda环境里面的bin文件里
#如要在iqtree建树过程中增加bootstrap， 则在iqtree的"cmd_line":中添加 -bb 1000 （iqtree的超快bootstrap）或 -b 1000（传统bootstrap）

"iqtree":{
    "program_type": "tree",
    "cmd_line": "iqtree -s INPUT -pre PATH/IDENTIFIER -bb 1000 > /dev/null",
    "ouput_filename": "PATH/IDENTIFIER.treefile"
    },

4）orthofinder结果
所在文件夹：xxx/orthsp1/OrthoFinder/Results_Sep25/Orthogroups

image.png

orthsp1/OrthoFinder/Results_Sep25/Species_Tree

image.png

5）运行EasySpeciesTree 构建物种进化树

python EasySpeciesTree.py -in1 species_id.txt -in2 SingleCopyOrthogroups.txt -in3 Orthogroups.csv -in4 all-pep.fas -nb 100 -t 16

  -h, --help            show this help message and exit
  -in1 INPUT1, --input1 INPUT1
                        offer the prefix of all abbreviated species id
  -in2 INPUT2, --input2 INPUT2
                        offer the Single-copy Orthogroups file, SingleCopyOrthogroups.txt
  -in3 INPUT3, --input3 INPUT3
                        offer the all Orthogroups file, Orthogroups.csv
  -in4 INPUT4, --input4 INPUT4
                        offer all species protein sequences
  -t THREAD, --thread THREAD
                        set the number of thread, default=10
  -nb BOOTSTRAP, --bootstrap BOOTSTRAP
                        set the number of bootstrap, default=100
  -m MODEL, --model MODEL
                        set the model of amino acid substitution, default=PROTGAMMAJTT

输入文件-in2 和 -in3 来自xxx/orthsp1/OrthoFinder/Results_Sep25/Orthogroups

image.png

输入文件-in1来自
/public1/home/off_jslin/analysis/specieTres/orthsp2/OrthoFinder/Results_Sep25/WorkingDirectory/SpeciesIDs.txt的第二列
awk '{print $2}' SpeciesIDs.txt |sed -i 's/.fa//g' > species_id.txt

输入文件-in4: cat *.fa >>all.pep.fas

6）EasySpeciesTree运行结果，会生成4个文件夹

image.png

将串联法和并联法生成的结果文件RAxML_bipartitions.concatenation_out.nwk，Astral.coalescence_tree.nwk导入FigTree、MEGA或者iTOL进行可视化

串联法(Concatenation)（先将不同物种之间的每个单拷贝基因单独进行多序列比对，然后将这些比对后的单拷贝基因进行首尾相连串接成一个supergene矩阵，最后将这个supergene用于构建系统发育树）

并联法(Coalescence)（先将不同物种之间的每个单拷贝基因单独进行多序列比对，并构建每一个单拷贝基因对应的基因树，然后将所有单拷贝基因对应的基因树进行合并重构出相应的物种树）进行ML系统发育树的构建。

参考：

https://www.jianshu.com/p/bfa568c8f4c4
https://www.jianshu.com/p/9ef6d7f273b3
https://www.jianshu.com/p/90301eeb063a

物种进化树构建

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

基因组

微生物信息学

OrthoFinder 基因扩张与收缩分析

基因家族_收缩与扩张分析

基因组