美文网首页js css html
分析 | kaks计算

分析 | kaks计算

作者: shwzhao | 来源:发表于2022-07-05 16:44 被阅读0次

    准备文件:test.cds.fatest.pep.fa

    $ grep ">" test.pep.fa
    >AT3G47090
    >AT3G47110
    >AT3G47570
    >AT3G47580
    >AT5G20480
    >AT5G39390
    

    1. 分步计算

    1.1 序列全局比对

    注意:这里我将同源基因两两配对,在实际分析中选择的是共线性块的基因对。

    $ # 这里用 mafft
    $ mafft --auto --thread 10 test.AT3G47110-AT3G47090.pep.fa > test.AT3G47110-AT3G47090.pep.fa.aln
    

    1.2 pal2nal.pl 回译

    • -output: 输出格式(clustal|paml|fasta|codon),默认clustal
    • -nogap: 删除 gap
    $ pal2nal.pl test.AT3G47110-AT3G47090.pep.fa.aln test.AT3G47110-AT3G47090.cds.fa -output clustal -nogap > test.AT3G47110-AT3G47090.fa.aln.clustal 
    

    1.3 KaKs_Calculator2

    • AXTConvertor
      Usage: AXTConvertor [Clustal/Msf/Nexus/Phylip/Pir] [AXT]
    AXTConvertor test.AT3G47110-AT3G47090.fa.aln.clustal test.AT3G47110-AT3G47090.fa.aln.axt
    
    • KaKs_Calculator
      -i: axt 文件,将各基因对axt文件cat到一起,生成test.pep.fa.axt
      -o: 结果文件
      -c: 密码子表
      -m: 计算方法(NG|YN|......),默认MA
    $ KaKs_Calculator -i test.pep.fa.axt -o test.pep.fa.kaks
    Method(s): MA
    Genetic code: 1-Standard Code
    Please wait while reading sequences and calculating...
    1 AT3G47110&AT3G47090   [OK]
    2 AT3G47570&AT3G47090   [OK]
    3 AT3G47570&AT3G47110   [OK]
    4 AT3G47580&AT3G47090   [OK]
    5 AT3G47580&AT3G47110   [OK]
    6 AT3G47580&AT3G47570   [OK]
    7 AT5G20480&AT3G47090   [OK]
    8 AT5G20480&AT3G47110   [OK]
    9 AT5G20480&AT3G47570   [OK]
    10 AT5G20480&AT3G47580  [OK]
    11 AT5G39390&AT3G47090  [OK]
    12 AT5G39390&AT3G47110  [OK]
    13 AT5G39390&AT3G47570  [OK]
    14 AT5G39390&AT3G47580  [OK]
    15 AT5G39390&AT5G20480  [OK]
    Mission accomplished. (Time elapsed: 0:25)
    

    1.4 yn00

    还没看,之后补充

    yn00
    

    2. 一步计算

    • ParaAT.pl
      -h: 同源基因对文件
      -n: cds.fa
      -a: pep.fa
      -p: 指定多线程文件
      -m: 指定比对工具(clustalw2|t_coffee|mafft|muscle)
      -g: 去除比对有gap的密码子
      -k: 用KaKs_Calculator 计算kaks值
      -o: 输出结果的目录
      -f: 输出比对文件的格式(axt|fasta|paml|codon|clustal)
    $ grep ">" test.pep.fa | sed 's/>//' | awk '{a[NR]=$1;for(i=1;i<NR;i++){print $1"\t"a[i]}}' > geneid.pairs.txt
    AT3G47110       AT3G47090
    AT3G47570       AT3G47090
    AT3G47570       AT3G47110
    AT3G47580       AT3G47090
    AT3G47580       AT3G47110
    AT3G47580       AT3G47570
    AT5G20480       AT3G47090
    AT5G20480       AT3G47110
    AT5G20480       AT3G47570
    AT5G20480       AT3G47580
    AT5G39390       AT3G47090
    AT5G39390       AT3G47110
    AT5G39390       AT3G47570
    AT5G39390       AT3G47580
    AT5G39390       AT5G20480
    
    $ echo 12 > proc # 设置线程数
    $ ParaAT.pl -h geneid.pairs.txt -n test.cds.fa -a test.pep.fa -o paraat -m mafft -f axt -g -p proc -kaks
    $ ls paraat | head -4
    AT3G47110-AT3G47090.cds_aln.axt
    AT3G47110-AT3G47090.cds_aln.axt.kaks
    AT3G47570-AT3G47090.cds_aln.axt
    AT3G47570-AT3G47090.cds_aln.axt.kaks
    

    参考:
    简书 | Kaks_calculator计算ka/ks 值
    简书 | kaks calculator批量计算多个基因的选择压力kaks值
    公众号 | 基因学苑 | 生物信息百Jia软件(三):Muscle
    简书 | Mr_我爱读文献 | 学会正确选择多序列比对(coding-sequences)软件

    补充:

    多序列比对的软件:musclemafftclustalw......
    现在常用的是前两者,上面用的mafft,这里看一下muscle

    什么时候要用到多序列比对,它的结果能用于什么呢?

    1. 用于构建基因树:
    1.1 用trimAl 修剪比对结果,用iqtreefasttree等进行建pep树;
    1.2 用pal2nal.plcds序列回帖到比对结果,用于构建cds树。

    2. 用于构建物种树:将单拷贝基因家族的比对结果串联建树。

    3. 共线性块上的基因对进行全局比对,回帖cds序列,用yn00等计算ka、ks值。

    如果不正确,希望批评指正!


    $ conda search muscle
    Loading channels: done
    # Name                       Version           Build  Channel
    muscle                        3.8.31               0  bioconda
    muscle                      3.8.1551               1  bioconda
    muscle                      3.8.1551               2  bioconda
    muscle                      3.8.1551      h2d50403_3  bioconda
    muscle                      3.8.1551      h6bb024c_4  bioconda
    muscle                      3.8.1551      h7d875b9_6  bioconda
    muscle                      3.8.1551      hc9558a2_5  bioconda
    muscle                           5.1      h7d875b9_0  bioconda
    muscle                           5.1      h9f5acd7_1  bioconda
    
    • 运行
    $ cat test.fa
    >gene1
    MRLFLLLAFNALMQLEAYGFTDESDRQALLEIKSQVSESKRDALSAWNNSFP
    >gene2
    MGVPCIVMRLILVSALLVSVSLEHSDMVCAQTIRLTEETDKQALLEFKETSRVVLG
    >gene3
    MRLFLLLAFNALMLLETHGFTDETDRQALLQFKSQVSEDKRVVLSSWNHSFPLCNWKGVT
    >gene4
    MKLFLLLSFSAHLLLGETDRQALLEFKSQVSEGKRDVLSSWNNSFPLCNWKWVT
    >gene5
    MKLSFSLVFNALTLLLQVCIFAQARFSNETDMQALLEFKSQVSENNKREVLASWNHSSPF
    >gene6
    MKVCILVFAQARFSNETDMQALLEFKSQVTENKREVLASWNHSFPL
    
    $ muscle -in test.fa -quiet | seqkit seq -w 0
    >gene2
    MGVPCIVMRLILVSALLVSVSLEHSDMVCAQTIRLTEETDKQALLEFKE-----TSRVVLG---------------
    >gene5
    -------MKLSFS--LVFNALTLLLQVCIFAQARFSNETDMQALLEFKSQVSENNKREVLASWNHSSPF-------
    >gene6
    -------MKVCIL---------------VFAQARFSNETDMQALLEFKSQVTE-NKREVLASWNHSFPL-------
    >gene4
    -------MKLFLL--LSFSAHL------LL------GETDRQALLEFKSQVSE-GKRDVLSSWNNSFPLCNWKWVT
    >gene1
    -------MRLFLL--LAFNALM------QLEAYGFTDESDRQALLEIKSQVSE-SKRDALSAWNNSFP--------
    >gene3
    -------MRLFLL--LAFNALM------LLETHGFTDETDRQALLQFKSQVSE-DKRVVLSSWNHSFPLCNWKGVT
    
    • 其他
    $ muscle
    
    MUSCLE v3.8.1551 by Robert C. Edgar
    
    http://www.drive5.com/muscle
    This software is donated to the public domain.
    Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.
    
    
    Basic usage
    
        muscle -in <inputfile> -out <outputfile>
    
    Common options (for a complete list please see the User Guide):
    
        -in <inputfile>    Input file in FASTA format (default stdin)
        -out <outputfile>  Output alignment in FASTA format (default stdout)
        -diags             Find diagonals (faster for similar sequences)
        -maxiters <n>      Maximum number of iterations (integer, default 16)
        -maxhours <h>      Maximum time to iterate in hours (default no limit)
        -html              Write output in HTML format (default FASTA)
        -msf               Write output in GCG MSF format (default FASTA)
        -clw               Write output in CLUSTALW format (default FASTA)
        -clwstrict         As -clw, with 'CLUSTAL W (1.81)' header
        -log[a] <logfile>  Log to file (append if -loga, overwrite if -log)
        -quiet             Do not write progress messages to stderr
        -version           Display version information and exit
    
    Without refinement (very fast, avg accuracy similar to T-Coffee): -maxiters 2
    Fastest possible (amino acids): -maxiters 1 -diags -sv -distance1 kbit20_3
    Fastest possible (nucleotides): -maxiters 1 -diags
    

    看来我的版本比较老了

    相关文章

      网友评论

        本文标题:分析 | kaks计算

        本文链接:https://www.haomeiwen.com/subject/mmpsbrtx.html