AliStat安装与使用2022-01-24

作者: 土雕艺术家 | 来源:发表于2022-01-24 17:37 被阅读0次

AliStat-github用于系统发育和系统基因组学研究的多序列比对的完整性量化分析
https://github.com/thomaskf/AliStat

安装下载,一般服务器都是有C++的,直接下载压缩包以后解压make就可以使用。

$ tar -zxvf AliStat_1.xx.tar.gz
$ cd AliStat_1.xx
$ make
$./alistat -h
================================================================================
                        Welcome to Alistat - Version 1.13

Syntax: /apps/AliStat/alistat <alignment file> <data type> [other options]
        /apps/AliStat/alistat -h

  <alignment file> : Multiple alignment file in FASTA format
  <data type>      : 1  - Single nucleotides (SN);
                     2  - Di-nucleotides (DN);
                     3  - Codons (CD);
                     4  - 10-state genotype data (10GT);
                     5  - 14-state genotype data (14GT);
                     6  - Amino acids (AA);
                     7  - Mixture of nucleotides and amino acids (NA)
                          (User has to specify the data type for each partition
                           inside the partition file)
other options:
  -b               : Report the brief summary of figures to the screen
                     Output format:
                     File,#seqs,#sites,Ca,Cr_max,Cr_min,Cc_max,Cc_min,Cij_max,Cij_min
                     (this option cannot work with the options: -o,-t,-r,-m,-i,-d)
  -c <coding_type> : 0  - A, C, G, T [default];
                     1  - C, T, R (i.e. C, T, AG);
                     2  - A, G, Y (i.e. A, G, CT);
                     3  - A, T, S (i.e. A, T, CG);
                     4  - C, G, W (i.e. C, G, AT);
                     5  - A, C, K (i.e. A, C, GT);
                     6  - G, T, M (i.e. G, T, AC);
                     7  - K, M    (i.e. GT,   AC);
                     8  - S, W    (i.e. GC,   AT);
                     9  - R, Y    (i.e. AG,   CT);
                     10 - A, B    (i.e. A,   CGT);
                     11 - C, D    (i.e. C,   AGT);
                     12 - G, H    (i.e. G,   ACT);
                     13 - T, V    (i.e. T,   ACG);
                     (this option is only valid for <data type> = 1)

  -o <FILE>        : Prefix for output files
                     (default: <alignment file> w/o .ext)
  -n <FILE>        : Only consider sequences with names listed in FILE
  -p <FILE>        : Specify the partitions
                     For <data type> = 1 - 6, partition file format:
                       "<partition name 1>=<start pos>-<end pos>, ..."
                       "<partition name 2>= ..."
                       Example:
                           part1=1-50,60-100
                           part2=101-200
                       (enumeration starts with 1)
                     For <data type> = 7, partition file format:
                       "<SN/DN/CD/10GT/14GT/AA>, <partition name 1>=<start pos>-<end pos>, ..."
                       "<SN/DN/CD/10GT/14GT/AA>, <partition name 2>= ..."
                       Example:
                           SN,part1=1-50,60-100
                           AA,part2=101-200
                           CD,part3=201-231
                     (this option cannot be used with '-s' at the same time)
  -s <n1,n2>       : Sliding window analysis: window size = n1; step size = n2
                     (this option cannot be used with '-p' at the same time)
  -t <n3,n4,...>   : Only output the tables n3, n4, ...
                     1 - C scores for individual sequences (Cr)
                     2 - C scores for individual sites (Cc)
                     3 - Distribution of C scores for individual sites (Cc)
                     4 - Matrix with C scores for pairs of sequences (Cij)
                     5 - Matrix with incompleteness scores for pairs of
                         sequences (Iij = 1 - Cij)
                     6 - Table with C score and incompleteness scores for pairs
                         of sequences (Cij & Iij)
                     (default: the program does not output any tables)
                     If "-t" option is used but no <n3,n4,...>, then the program
                     outputs all tables
  -r <row|col|both>: Reorder the rows/columns (or both) of the alignment
                     according to the Cr/Cc scores
                     All the tables are displayed according to the reordered
                     alignment
                     To output the reordered alignment, please also use the
                     option -m
  -m <n5>          : Mask the alignment; 0 <= n5 <= 1
                     Output (1) the alignment with columns Cc >= n5 in the file
                     'Mask.fst', (2) the alignment with columns Cc < n5 in the
                     file 'Disc.fst', and (3) the alignment with an extra row in
                     the first line to indicate whether the column is masked in
                     the file 'Stat.fst'.
                     (Special case: if no <n5>, whole alignment is outputted in
                      the file 'Mask.fst'; if <n5> is 0, the alignment with
                      columns Cc > 0 is outputted in the file 'Mask.fst')
  -i <1|2|3>       : Generate heat map image for Cij scores of sequence pairs
                     1 - Triangular heat map
                     2 - Rectangular heat map
                     3 - Both
                     (if no number, then both triangular & rectangular heat map
                      files are outputted)
  -d               : Report the p distances between sequences
                     in the file with extension '.p-dist.csv'
                     (default: disabled)
                     Note: computation of p distances may take long time for
                          large number of sequences
  -u               : Color scheme of the heatmaps
                     1 - Default color scheme, suitable for color-blind persons
                     2 - Another color scheme
  -h               : This help page
================================================================================

<alignment file> 输入fasta格式序列
<data type>数据类型。
-c密码子表,这个只有输入数据类型为1(核苷酸序列)的时候才需要指定。
-b是将结果打印到屏幕,但是如果我们使用-o打印到指定文件了那就-b就无效了。
-t是输出指定类型的表格
-p是指定数据类型以及分区情况
–m( 0 ≤ ≤ 1)是根据Distribution of completeness scores for individual sites (Cc)完整性评估输出数据集,比如设置-m 0.6,则完整度大于0.6的基因序列输出至'Mask.fst',低于0.6则输出至'Disc.fst'
-i是根据Matrix with completeness scores for pairs of sequences输出热图
-d输出序列间p-distances值到'.p-dist.csv'
-o输出的文件名
–s是设置滑窗。 -s 10 5 :代表窗口大小为10,滑动距离为5
一般建议

./alistat input.fas 6 -o output -i -d -t

相关文章

网友评论

    本文标题:AliStat安装与使用2022-01-24

    本文链接:https://www.haomeiwen.com/subject/xgrbhrtx.html