美文网首页
Gepard Linux Command line mode

Gepard Linux Command line mode

作者: Yizhe_Lin | 来源:发表于2023-12-25 21:28 被阅读0次

  Gepard软件能快速对两个fasta格式的序列进行共线性分析,获得二维点图。我利用该软件判断由GetOrganelle组装获得的叶绿体ssc区的方向。尽管这是2007年就发布的软件[1],但应付上述目的绰绰有余。

图1 Gepard在windows中的可视化软件界面,这里是对Cas_hy, cas004-2两个叶绿体基因组文件进行分析

  之前已经尝试过windows的版本(参考[3]),但当有大量组装完成的叶绿体ssc需要判断时,就有必要批处理了。Gepard的linux安装利用conda或Docker,相当便捷,参考https://github.com/univieCUBE/gepard

  安装后键入Gepardcmd弹出help信息

Gepard 2.0 - command line mode

Reference:
Krumsiek J, Arnold R, Rattei T
Gepard: A rapid and sensitive tool for creating dotplots on genome scale.
Bioinformatics 2007; 23(8): 1026-8. PMID: 17309896

Parameters are supplied as -name value

Required parameters:
  -seq:        the sequences, seperated by spaces. The first gets paired to the second, third to fourth and so on.
  -matrix:      substitution matrix file
  -outfile:     output file name

... (Following omission)

  -seq, -matrix, -outfile是必需的,-seq, -outfile很好理解,与windows界面不同的是需要键入-matrix,这是要求输入一个核苷酸替代矩阵,官方tutorial推荐matrices/edna.mat。conda安装的通过以下命令找到这个替代矩阵:

$ which gepard
~/miniconda3/envs/gepard/bin/gepard
$ cd ~/miniconda3/envs/gepard
$ find -name edna.mat
./share/gepard/resources/matrices/edna.mat
./share/gepard/src/matrices/edna.mat
$ cd ./share/gepard/resources/matrices
$ less edna.mat
#
# This matrix was created by Todd Lowe 12/10/92
#
# Uses ambiguous nucleotide codes, probabilities rounded to
# nearest integer
#
# Lowest score = -4, Highest score = 5
#
# modified for use with gepard (delimiter letter Z)

   A  T  G  C  N  W  R  Y  K  M  B  V  H  D  S  U  Z  X 
A  1  0  0  0 -2 -4  1  1 -4 -4  1 -4 -1 -1 -1 -4 -9 -9
T  0  1  0  0 -2 -4  1 -4  1  1 -4 -1 -4 -1 -1  5 -9 -9
G  0  0  1  0 -2  1 -4  1 -4  1 -4 -1 -1 -4 -1 -4 -9 -9
C  0  0  0  1 -2  1 -4 -4  1 -4  1 -1 -1 -1 -4 -4 -9 -9
N -2 -2 -2 -2 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -2 -9 -9
W -4 -4  1  1 -1 -1 -4 -2 -2 -2 -2 -1 -1 -3 -3 -4 -9 -9
R  1  1 -4 -4 -1 -4 -1 -2 -2 -2 -2 -3 -3 -1 -1  1 -9 -9
Y  1 -4  1 -4 -1 -2 -2 -1 -4 -2 -2 -3 -1 -3 -1 -4 -9 -9
K -4  1 -4  1 -1 -2 -2 -4 -1 -2 -2 -1 -3 -1 -3 -1 -9 -9
M -4  1  1 -4 -1 -2 -2 -2 -2 -1 -4 -1 -3 -3 -1  1 -9 -9
B  1 -4 -4  1 -1 -2 -2 -2 -2 -4 -1 -3 -1 -1 -3 -4 -9 -9
V -4 -1 -1 -1 -1 -1 -3 -3 -1 -1 -3 -1 -2 -2 -2 -1 -9 -9
H -1 -4 -1 -1 -1 -1 -3 -1 -3 -3 -1 -2 -1 -2 -2 -4 -9 -9
D -1 -1 -4 -1 -1 -3 -1 -3 -1 -3 -1 -2 -2 -1 -2 -1 -9 -9
S -1 -1 -1 -4 -1 -3 -1 -1 -3 -1 -3 -2 -2 -2 -1 -1 -9 -9
U -4  5 -4 -4 -2 -4  1 -4  1  1 -4 -1 -4 -1 -1  5 -9 -9
Z -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9
X -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9 -9

  知道矩阵存放位置后,还需找到软件存放地址(因为我试验环境变量的命令直接调用跑不通,不清楚是为什么),实际上就是.../share/geparddist,里面有Gepard-1.40.jar Gepard-2.1.jar两个文件,使用Gepard-2.1.jar(两个程序都可用,但命令略有不同,以后者为例)。

  然后就可以运行软件了。由于官方tutorial文件没有及时更新,实际的运行命令应该是(对应Gepard-2.1版本,即现在下载默认的版本)[2]

java -cp ~/miniconda3/envs/gepard/share/gepard/dist/Gepard-2.1.jar org.gepard.client.cmdline.CommandLine \
-seq ref.fasta test.fasta \
-matrix ~/miniconda3/envs/gepard/share/gepard/resources/matrices/edna.mat \
-outfile test1.png

  运行会报错并弹出一个窗口,需要安装xmanager 11什么的,就按指示安装,第一次安装有个30天试用,管他呢,先用了,后面用到再想办法...:

Loading substitution matrix...
Loading sequence from ref.fasta
Loading sequence from test.fasta
Calculating suffix array... 
Calculating dotplot... 
Creating image and writing to file... 
Exception in thread "main" java.awt.AWTError: Can't connect to X11 window server using 'localhost:12.0' as the value of the DISPLAY variable.
    at java.desktop/sun.awt.X11GraphicsEnvironment.initDisplay(Native Method)
    at java.desktop/sun.awt.X11GraphicsEnvironment$1.run(X11GraphicsEnvironment.java:104)
    at java.base/java.security.AccessController.doPrivileged(Native Method)
    at java.desktop/sun.awt.X11GraphicsEnvironment.<clinit>(X11GraphicsEnvironment.java:63)
    at java.base/java.lang.Class.forName0(Native Method)
    at java.base/java.lang.Class.forName(Class.java:315)
    at java.desktop/java.awt.GraphicsEnvironment$LocalGE.createGE(GraphicsEnvironment.java:101)
    at java.desktop/java.awt.GraphicsEnvironment$LocalGE.<clinit>(GraphicsEnvironment.java:83)
    at java.desktop/java.awt.GraphicsEnvironment.getLocalGraphicsEnvironment(GraphicsEnvironment.java:129)
    at java.desktop/java.awt.image.BufferedImage.createGraphics(BufferedImage.java:1181)
    at java.desktop/java.awt.image.BufferedImage.getGraphics(BufferedImage.java:1170)
    at org.gepard.client.Plotter.<init>(Plotter.java:92)
    at org.gepard.client.cmdline.CommandLine.main(CommandLine.java:304)

  安装完再运行,就正常,

Loading substitution matrix...
Loading sequence from ref.fasta
Loading sequence from test.fasta
Calculating suffix array... 
Calculating dotplot... 
Creating image and writing to file...

  最后写个循环就实现批处理啦!


参考资料:
[1] Krumsiek J, Arnold R, Rattei T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale[J]. Bioinformatics, 2007, 23(8): 1026-1028.
[2] How to start gepard on the commandline.
[3] 被子植物·叶绿体组装、注释与比较分析·框架

相关文章

网友评论

      本文标题:Gepard Linux Command line mode

      本文链接:https://www.haomeiwen.com/subject/anikndtx.html