美文网首页
Gepard Linux Command line mode

Gepard Linux Command line mode

作者: Yizhe_Lin | 来源:发表于2023-12-25 21:28 被阅读0次

      Gepard软件能快速对两个fasta格式的序列进行共线性分析,获得二维点图。我利用该软件判断由GetOrganelle组装获得的叶绿体ssc区的方向。尽管这是2007年就发布的软件[1],但应付上述目的绰绰有余。

    图1 Gepard在windows中的可视化软件界面,这里是对Cas_hy, cas004-2两个叶绿体基因组文件进行分析

      之前已经尝试过windows的版本(参考[3]),但当有大量组装完成的叶绿体ssc需要判断时,就有必要批处理了。Gepard的linux安装利用conda或Docker,相当便捷,参考https://github.com/univieCUBE/gepard

      安装后键入Gepardcmd弹出help信息

    Gepard 2.0 - command line mode
    
    Reference:
    Krumsiek J, Arnold R, Rattei T
    Gepard: A rapid and sensitive tool for creating dotplots on genome scale.
    Bioinformatics 2007; 23(8): 1026-8. PMID: 17309896
    
    Parameters are supplied as -name value
    
    Required parameters:
      -seq:        the sequences, seperated by spaces. The first gets paired to the second, third to fourth and so on.
      -matrix:      substitution matrix file
      -outfile:     output file name
    
    ... (Following omission)
    

      -seq, -matrix, -outfile是必需的,-seq, -outfile很好理解,与windows界面不同的是需要键入-matrix,这是要求输入一个核苷酸替代矩阵,官方tutorial推荐matrices/edna.mat。conda安装的通过以下命令找到这个替代矩阵:

    $ which gepard
    ~/miniconda3/envs/gepard/bin/gepard
    $ cd ~/miniconda3/envs/gepard
    $ find -name edna.mat
    ./share/gepard/resources/matrices/edna.mat
    ./share/gepard/src/matrices/edna.mat
    $ cd ./share/gepard/resources/matrices
    $ less edna.mat
    #
    # This matrix was created by Todd Lowe 12/10/92
    #
    # Uses ambiguous nucleotide codes, probabilities rounded to
    # nearest integer
    #
    # Lowest score = -4, Highest score = 5
    #
    # modified for use with gepard (delimiter letter Z)
    
       A  T  G  C  N  W  R  Y  K  M  B  V  H  D  S  U  Z  X 
    A  1  0  0  0 -2 -4  1  1 -4 -4  1 -4 -1 -1 -1 -4 -9 -9
    T  0  1  0  0 -2 -4  1 -4  1  1 -4 -1 -4 -1 -1  5 -9 -9
    G  0  0  1  0 -2  1 -4  1 -4  1 -4 -1 -1 -4 -1 -4 -9 -9
    C  0  0  0  1 -2  1 -4 -4  1 -4  1 -1 -1 -1 -4 -4 -9 -9
    N -2 -2 -2 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 -9 -9
    W -4 -4  1  1 -1 -1 -4 -2 -2 -2 -2 -1 -1 -3 -3 -4 -9 -9
    R  1  1 -4 -4 -1 -4 -1 -2 -2 -2 -2 -3 -3 -1 -1  1 -9 -9
    Y  1 -4  1 -4 -1 -2 -2 -1 -4 -2 -2 -3 -1 -3 -1 -4 -9 -9
    K -4  1 -4  1 -1 -2 -2 -4 -1 -2 -2 -1 -3 -1 -3 -1 -9 -9
    M -4  1  1 -4 -1 -2 -2 -2 -2 -1 -4 -1 -3 -3 -1  1 -9 -9
    B  1 -4 -4  1 -1 -2 -2 -2 -2 -4 -1 -3 -1 -1 -3 -4 -9 -9
    V -4 -1 -1 -1 -1 -1 -3 -3 -1 -1 -3 -1 -2 -2 -2 -1 -9 -9
    H -1 -4 -1 -1 -1 -1 -3 -1 -3 -3 -1 -2 -1 -2 -2 -4 -9 -9
    D -1 -1 -4 -1 -1 -3 -1 -3 -1 -3 -1 -2 -2 -1 -2 -1 -9 -9
    S -1 -1 -1 -4 -1 -3 -1 -1 -3 -1 -3 -2 -2 -2 -1 -1 -9 -9
    U -4  5 -4 -4 -2 -4  1 -4  1  1 -4 -1 -4 -1 -1  5 -9 -9
    Z -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9
    X -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9
    

      知道矩阵存放位置后,还需找到软件存放地址(因为我试验环境变量的命令直接调用跑不通,不清楚是为什么),实际上就是.../share/geparddist,里面有Gepard-1.40.jar Gepard-2.1.jar两个文件,使用Gepard-2.1.jar(两个程序都可用,但命令略有不同,以后者为例)。

      然后就可以运行软件了。由于官方tutorial文件没有及时更新,实际的运行命令应该是(对应Gepard-2.1版本,即现在下载默认的版本)[2]

    java -cp ~/miniconda3/envs/gepard/share/gepard/dist/Gepard-2.1.jar org.gepard.client.cmdline.CommandLine \
    -seq ref.fasta test.fasta \
    -matrix ~/miniconda3/envs/gepard/share/gepard/resources/matrices/edna.mat \
    -outfile test1.png
    

      运行会报错并弹出一个窗口,需要安装xmanager 11什么的,就按指示安装,第一次安装有个30天试用,管他呢,先用了,后面用到再想办法...:

    Loading substitution matrix...
    Loading sequence from ref.fasta
    Loading sequence from test.fasta
    Calculating suffix array... 
    Calculating dotplot... 
    Creating image and writing to file... 
    Exception in thread "main" java.awt.AWTError: Can't connect to X11 window server using 'localhost:12.0' as the value of the DISPLAY variable.
        at java.desktop/sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
        at java.desktop/sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:104)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.desktop/sun.awt.X11GraphicsEnvironment.<clinit>(X11GraphicsEnvironment.java:63)
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Class.java:315)
        at java.desktop/java.awt.GraphicsEnvironment$LocalGE.createGE(GraphicsEnvironment.java:101)
        at java.desktop/java.awt.GraphicsEnvironment$LocalGE.<clinit>(GraphicsEnvironment.java:83)
        at java.desktop/java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:129)
        at java.desktop/java.awt.image.BufferedImage.createGraphics(BufferedImage.java:1181)
        at java.desktop/java.awt.image.BufferedImage.getGraphics(BufferedImage.java:1170)
        at org.gepard.client.Plotter.<init>(Plotter.java:92)
        at org.gepard.client.cmdline.CommandLine.main(CommandLine.java:304)
    

      安装完再运行,就正常,

    Loading substitution matrix...
    Loading sequence from ref.fasta
    Loading sequence from test.fasta
    Calculating suffix array... 
    Calculating dotplot... 
    Creating image and writing to file...
    

      最后写个循环就实现批处理啦!


    参考资料:
    [1] Krumsiek J, Arnold R, Rattei T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale[J]. Bioinformatics, 2007, 23(8): 1026-1028.
    [2] How to start gepard on the commandline.
    [3] 被子植物·叶绿体组装、注释与比较分析·框架

    相关文章

      网友评论

          本文标题:Gepard Linux Command line mode

          本文链接:https://www.haomeiwen.com/subject/anikndtx.html