美文网首页
利用核酸序列估算分歧时间

利用核酸序列估算分歧时间

作者: 多啦A梦的时光机_648d | 来源:发表于2023-01-06 12:07 被阅读0次

    需要的文件

    1.带有话是标定点的物种树
    2.比对好的phylip格式序列文件

    化石标定点物种树(删除不需要的枝长等信息)

     86 1
    
    (((((((((((((((((Distichlis_spicata,Distichlis_littoralis),Distichlis_bajaensis),((Bouteloua_dactyloides,Bouteloua_curtipendula),Bouteloua_gracilis)),(Hilaria_cenchroides,Hilaria_rigida)),(Muhlenbergia_huegelii,Muhlenbergia_japonica)),Tragus_berteronianus),Tridens_brasiliensis),((((((Oropetium_thomaeum,Oropetium_aristatum),Tripogon_chinensis),Tripogonella_loliiformis),Melanocenchris_abyssinica),Desmostachya_bipinnata),Halopyrum_mucronatum)),(((((Perotis_indica,Perotis_rara),Perotis_hildebrandtii),(Trichoneura_grandiglumis,Trichoneura_ciliata)),Vaseyochloa_multinervosa),((Dactyloctenium_radulans,Dactyloctenium_aegyptium),Odyssea_paucinervis))),((Orinus_thoroldii,Triodia_rigidissima),Cleistogenes_squarrosa)),(((((((((((Cynodon_radiatus,Cynodon_dactylon),Eustachys_glauca),(Microchloa_indica,Oxychloris_scariosa)),Lepturus_repens),((((Chloris_virgata,Enteropogon_dolichostachyus),Chloris_truncata),Chloris_barbata),Enteropogon_ramosus)),Astrebla_pectinata),(Eleusine_coracana,Eleusine_indica)),((Dinebra_retroflexa,Dinebra_panicea),Dinebra_chinensis)),Acrachne_racemosa),Diplachne_fusca),((Aeluropus_lagopoides,Aeluropus_littoralis),Aeluropus_sinensis))),((((((((Sporobolus_alterniflorus,Sporobolus_maritimus),Sporobolus_michauxianus),Sporobolus_heterolepis),Sporobolus_maximus),((Sporobolus_virginicus,Sporobolus_helvolus),Sporobolus_aculeatus)),(Sporobolus_fertilis,Sporobolus_diandrus)),Urochondra_setulosa),(((Zoysia_matrella,Zoysia_pacifica),(Zoysia_japonica,Zoysia_sinica)),(Zoysia_macrostachya,Zoysia_macrantha)))‘>29.49<32.28’),((((((Eragrostis_cilianensis,Eragrostis_pilosa),Eragrostis_ferruginea),(Eragrostis_minor,Eragrostis_autumnalis)),(Eragrostis_atrovirens,Harpachne_harpachnoides)),(Tetrachne_dregei,Uniola_paniculata)),(Enneapogon_desvauxii,Schmidtia_pappophoroides))'>32.76<35.29'),(Triraphis_mollis,Neyraudia_reynaudiana)),Centropodia_glauca)'>42.87<43',Coelachyrum_piercei),Cortaderia_selloana)‘>54.44<63.35’;
    

    准备phylip格式序列文件

    例如:Zoysia_sinica.fasta序列内部名字为
    >gi|1642520764|ref|NC_042187.1| Zoysia sinica chloroplast, complete genome
    GAAATACCCAATATCCTGTTGGAACAAGATATTGGGTATTTCTGGCTTTCCTTCCTTTAAAAATTCCTAT
    ATTTTAGGAGAAAAACCTTATCCATTAAGAGATGGAACTTCAAGAGCAGCTAAGTCTAGAGGGAAGTTGT
    GAGCATTACGTTCGTGCATTACTTCCATACCAAGATTAGCACGGTTGATGATATCAGCCCAAGTATTAAT
    

    修改fa文件内部序列名字和外部名字不统一

    cat *.fasta| sed 's/.fasta//g' >species.list
    for species in $(cat species.list); do cat ./$species.fasta | seqkit seq -n | awk '{print $1}' | sed "s/gi.*/$species/g" > t1; cat ./$species.fasta | seqkit seq -s -w 0 > t2; paste t1 t2 | seqkit tab2fx | seqkit seq -w 0 > $species.fas; rm t1 t2; done
    

    结果

    less Zoysia_sinica.fas
    >Zoysia_sinica
    GAAATACCCAATATCCTGTTGGAACAAGATATTGGGTATTTCTGGCTTTCCTTCCTTTAAAAATTCCTATATTTTAGGAGAAAAACCTTATCCATTAAGAGATGGAACTTCAAGAGCAGCTAAGTCTAGAGGGAAGTTGTGAGCATTACGTTCGTGCATTACTTCCATACCAAGATTAGCACGGTTGATGATATCAGCCCAAGTATTAATAACGCGACCTTGGCTATCAACTACAGATTGGTTGAAATTGAAACCATTTAGGTTGAATGCCATAGTACTAATACCTAAAGCAGTGAACCAGATCCCTACTACAGGCCAAGCAGCCAAGAAGAAGTGTAAAGAACGAGAGTTGTTGAAACTAGCATATTGGAAGATTAATCGACCAAAATAACCGTGAGCAGCCACAATGTTATAAGTCTCTTCCTCTTGACCAAATTTGTAACCCTCATTAGCAGATTCATTTTCAGTGGTTTCCCTGATCAAACTAGAGGTTACCAAGGAACCATGCATAGCACTGAATAGGGAACCGCCGAATACACCAGCTACACCTAACATGTGAAATGGATGCATAAGGATGTTGTGCTCTGCCTGGAATACAATCATAAAGTTGAAAGTACCAGAGATTCCTAAAGGCATACCATCAGAGAAACTTCCTTGACCAATAGGGTAAATCAAGAAAACAGCAGTAGCAGCTGCAACAGGAGCTGAATATGCAACAGCAATCCAAGGACGCATACCCAGACGGAAACTAAGTTCCCACTCACGACCCATATAACAAGCTACACCAAGTAAGAAGTGTAGAACAATTAGCTCATAAGGACCACCAT
    
    

    比对

    /home/lx_sky6/yt/soft/miniconda3/bin/mafft --thread 30 86.fas > 86.mafft.fas
    

    裁剪保存为phylip_paml格式

    trimal -in 86.mafft.fas -out 86.trimal.fas -automated1 -phylip_paml
    

    运行mcmctree(一共运行3次,第一次输出out.BV文件)

    mcmctree mcmctree.ctl

              seed = -1
           seqfile = /home/lx_sky6/yt/0105_xianyu/0105_mcmc/86.trimal.phy
          treefile = /home/lx_sky6/yt/0105_xianyu/0105_mcmc/input.tree
           outfile = out.txt
    
             ndata = 1
           seqtype = 0  * 0: nucleotides; 1:codons; 2:AAs
           usedata = 3    * 0: no data; 1:seq like; 2:use in.BV; 3: out.BV
             clock = 2    * 1: global clock; 2: independent rates; 3: correlated rates
           RootAge =   * safe constraint on root age, used if no fossil for root.
    
             model = 7    * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85
             alpha = 0.5    * alpha for gamma rates at sites
             ncatG = 5    * No. categories in discrete gamma
    
         cleandata = 0    * remove sites with ambiguity data (1:yes, 0:no)?
    
           BDparas = 1 1 0    * birth, death, sampling
       kappa_gamma = 6 2      * gamma prior for kappa
       alpha_gamma = 1 1      * gamma prior for alpha
    
       rgene_gamma = 2 2   * gamma prior for overall rates for genes
      sigma2_gamma = 1 10   * gamma prior for sigma^2     (for clock=2 or 3)
    
          finetune = 1: .1  .1  .1  .1 .01 .5  * auto (0 or 1) : times, musigma2, rates, mixing, paras, FossilErr
    
             print = 1
            burnin = 10000
          sampfreq = 5
           nsample = 30000
    
    *** Note: Make your window wider (100 columns) before running the program.
    

    再次运行mcmctree(第二次修改out.BV为in.BV作为输入,即修改mcmctree.ctl文件中usedata = 2为usedata = 3)

    mv out.BV in.BV
    mcmctree mcmctree.ctl
    
              seed = -1
           seqfile = /home/lx_sky6/yt/0105_xianyu/0105_mcmc/86.trimal.phy
          treefile = /home/lx_sky6/yt/0105_xianyu/0105_mcmc/input.tree
           outfile = out.txt
    
             ndata = 1
           seqtype = 0  * 0: nucleotides; 1:codons; 2:AAs
           usedata = 2    * 0: no data; 1:seq like; 2:use in.BV; 3: out.BV
             clock = 2    * 1: global clock; 2: independent rates; 3: correlated rates
           RootAge =   * safe constraint on root age, used if no fossil for root.
    
             model = 7    * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85
             alpha = 0.5    * alpha for gamma rates at sites
             ncatG = 5    * No. categories in discrete gamma
    
         cleandata = 0    * remove sites with ambiguity data (1:yes, 0:no)?
    
           BDparas = 1 1 0    * birth, death, sampling
       kappa_gamma = 6 2      * gamma prior for kappa
       alpha_gamma = 1 1      * gamma prior for alpha
    
       rgene_gamma = 2 2   * gamma prior for overall rates for genes
      sigma2_gamma = 1 10   * gamma prior for sigma^2     (for clock=2 or 3)
    
          finetune = 1: .1  .1  .1  .1 .01 .5  * auto (0 or 1) : times, musigma2, rates, mixing, paras, FossilErr
    
             print = 1
            burnin = 10000
          sampfreq = 5
           nsample = 30000
    
    *** Note: Make your window wider (100 columns) before running the program.
    

    再次运行mcmctree(第3次相对于第二次不做修改)

    mcmctree mcmctree.ctl
    
              seed = -1
           seqfile = /home/lx_sky6/yt/0105_xianyu/0105_mcmc/86.trimal.phy
          treefile = /home/lx_sky6/yt/0105_xianyu/0105_mcmc/input.tree
           outfile = out.txt
    
             ndata = 1
           seqtype = 0  * 0: nucleotides; 1:codons; 2:AAs
           usedata = 2    * 0: no data; 1:seq like; 2:use in.BV; 3: out.BV
             clock = 2    * 1: global clock; 2: independent rates; 3: correlated rates
           RootAge =   * safe constraint on root age, used if no fossil for root.
    
             model = 7    * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85
             alpha = 0.5    * alpha for gamma rates at sites
             ncatG = 5    * No. categories in discrete gamma
    
         cleandata = 0    * remove sites with ambiguity data (1:yes, 0:no)?
    
           BDparas = 1 1 0    * birth, death, sampling
       kappa_gamma = 6 2      * gamma prior for kappa
       alpha_gamma = 1 1      * gamma prior for alpha
    
       rgene_gamma = 2 2   * gamma prior for overall rates for genes
      sigma2_gamma = 1 10   * gamma prior for sigma^2     (for clock=2 or 3)
    
          finetune = 1: .1  .1  .1  .1 .01 .5  * auto (0 or 1) : times, musigma2, rates, mixing, paras, FossilErr
    
             print = 1
            burnin = 10000
          sampfreq = 5
           nsample = 30000
    
    *** Note: Make your window wider (100 columns) before running the program.
    

    最后将第二次和第三次运行结果的mcmc.txt文件导入tracer软件,如果ess值均大于200,且两次结果差异不大,则认为树可信。

    image.png

    如果小于200,则需要增加代数从新运行第二次和第三次,直到ESS>200。


    image.png

    相关文章

      网友评论

          本文标题:利用核酸序列估算分歧时间

          本文链接:https://www.haomeiwen.com/subject/ooedcdtx.html