美文网首页
paml计算 KaKs值

paml计算 KaKs值

作者: 斩毛毛 | 来源:发表于2020-07-22 21:53 被阅读0次

    此前介绍利用Kaks_calculator计算ka/ks 值,本次对paml 进行计算kaks做一简单介绍。

    软件安装

    PAML 可实现系统发育树的构建,祖先序列估计,进化模拟和 KaKs 计算等功能。其中分支及 位点 KaKs 的计算是本软件包的特色功能。

    wget http://abacus.gene.ucl.ac.uk/software/paml4.9j.tgz
    tar xf paml4.9j.tgz
    cd  paml4.9j
    rm bin/*.exe 
    cd src 
    make -f Makefile 
    rm *.o 
    mv baseml basemlg codeml pamp evolver yn00 chi2 ../bin
    

    此次用到的是codeml

    简单使用

    所需文件:

    • 同源基因对儿
    • 对应基因cds,pep序列
    • paml输入文件(有上述2文件得到)
    • 树文件(只有2物种,可自行制作,若有多物种,可进行构建树即可phlylip)

    1 paml 输入序列文件

    本次使用我最近的数据,来源于2个物种的同源基因对,具体如何得到基因对请挪步python版的MCScan绘图

    使用

    ParaAT.pl -h test.homologs -n test.cds -a test.pep -p proc -m muscle -f paml  -o paml_result
    

    上述脚本有疑问请挪步Kaks_calculator计算ka/ks 值

    上述得到一paml_result文件夹,每个同源基因对儿形成一个单独的以*.paml结尾的文件

    可获得共有27个同源基因对

    ls *.paml |wc -l
    27
    

    将所有*paml文件合并为paml的输入文件

    cat *.paml >>test.cod
    

    2 树文件

    关于树文件,可参考paml安装目录下*.trees格式

      3  4
    
    (1,2,3);
    ((1,2),3);
    ((1,3),2);
    ((2,3),1);
    

    其中3表示,3个物种,4表示树的个数;

    在本次我只有两个物种,所以得到如下树的输入文件

    vi test.trees
      2  1
    
    (1,2);
    

    3 配置文件

    可将paml安装目录下baseml.ctl 拷贝到自己所需目录下即可进行修改

    seqfile = test.cod * sequence data filename
         treefile = test.trees      * tree structure file name
          outfile = test.rlt           * main result file name
    
            noisy = 0  * 0,1,2,3,9: how much rubbish on the screen
          verbose = 0  * 0: concise; 1: detailed, 2: too much
          runmode = -2  * 0: user tree;  1: semi-automatic;  2: automatic
                       * 3: StepwiseAddition; (4,5):PerturbationNNI; -2: pairwise
    
          seqtype = 1  * 1:codons; 2:AAs; 3:codons-->AAs
        CodonFreq = 2  * 0:1/61 each, 1:F1X4, 2:F3X4, 3:codon table
    
            ndata = 27
            clock = 0  * 0:no clock, 1:clock; 2:local clock; 3:CombinedAnalysis
           aaDist = 0  * 0:equal, +:geometric; -:linear, 1-6:G1974,Miyata,c,p,v,a
       aaRatefile = dat/jones.dat  * only used for aa seqs with model=empirical(_F)
                       * dayhoff.dat, jones.dat, wag.dat, mtmam.dat, or your own
    
            model = 0
                       * models for codons:
                           * 0:one, 1:b, 2:2 or more dN/dS ratios for branches
                       * models for AAs or codon-translated AAs:
                           * 0:poisson, 1:proportional, 2:Empirical, 3:Empirical+F
                           * 6:FromCodon, 7:AAClasses, 8:REVaa_0, 9:REVaa(nr=189)
    
          NSsites = 0  * 0:one w;1:neutral;2:selection; 3:discrete;4:freqs;
                       * 5:gamma;6:2gamma;7:beta;8:beta&w;9:betaγ
                       * 10:beta&gamma+1; 11:beta&normal>1; 12:0&2normal>1;
                       * 13:3normal>0
    
            icode = 0  * 0:universal code; 1:mammalian mt; 2-10:see below
            Mgene = 0
                       * codon: 0:rates, 1:separate; 2:diff pi, 3:diff kapa, 4:all diff
                       * AA: 0:rates, 1:separate
    fix_kappa = 0  * 1: kappa fixed, 0: kappa to be estimated
            kappa = 2  * initial or fixed kappa
        fix_omega = 0  * 1: omega or omega_1 fixed, 0: estimate
            omega = .4 * initial or fixed omega, for codons or codon-based AAs
    
        fix_alpha = 1  * 0: estimate gamma shape parameter; 1: fix it at alpha
            alpha = 0. * initial or fixed alpha, 0:infinity (constant rate)
           Malpha = 0  * different alphas for genes
            ncatG = 8  * # of categories in dG of NSsites models
    
            getSE = 0  * 0: don't want them, 1: want S.E.s of estimates
     RateAncestor = 1  * (0,1,2): rates (alpha>0) or ancestral states (1 or 2)
    
       Small_Diff = .5e-6
        cleandata = 1  * remove sites with ambiguity data (1:yes, 0:no)?
    *  fix_blength = 1  * 0: ignore, -1: random, 1: initial, 2: fixed, 3: proportional
           method = 0  * Optimization method 0: simultaneous; 1: one branch a time
    
    * Genetic codes: 0:universal, 1:mammalian mt., 2:yeast mt., 3:mold mt.,
    * 4: invertebrate mt., 5: ciliate nuclear, 6: echinoderm mt.,
    * 7: euplotid mt., 8: alternative yeast nu. 9: ascidian mt.,
    * 10: blepharisma nu.
    * These codes correspond to transl_table 1 to 11 of GENEBANK.
    

    运行脚本

    codeml codeml.ctl
    

    即可得到相应的Ka,Ks

    相关文章

      网友评论

          本文标题:paml计算 KaKs值

          本文链接:https://www.haomeiwen.com/subject/xiqxlktx.html