AliStat-github用于系统发育和系统基因组学研究的多序列比对的完整性量化分析
https://github.com/thomaskf/AliStat
安装下载,一般服务器都是有C++的,直接下载压缩包以后解压make就可以使用。
$ tar -zxvf AliStat_1.xx.tar.gz
$ cd AliStat_1.xx
$ make
$./alistat -h
================================================================================
Welcome to Alistat - Version 1.13
Syntax: /apps/AliStat/alistat <alignment file> <data type> [other options]
/apps/AliStat/alistat -h
<alignment file> : Multiple alignment file in FASTA format
<data type> : 1 - Single nucleotides (SN);
2 - Di-nucleotides (DN);
3 - Codons (CD);
4 - 10-state genotype data (10GT);
5 - 14-state genotype data (14GT);
6 - Amino acids (AA);
7 - Mixture of nucleotides and amino acids (NA)
(User has to specify the data type for each partition
inside the partition file)
other options:
-b : Report the brief summary of figures to the screen
Output format:
File,#seqs,#sites,Ca,Cr_max,Cr_min,Cc_max,Cc_min,Cij_max,Cij_min
(this option cannot work with the options: -o,-t,-r,-m,-i,-d)
-c <coding_type> : 0 - A, C, G, T [default];
1 - C, T, R (i.e. C, T, AG);
2 - A, G, Y (i.e. A, G, CT);
3 - A, T, S (i.e. A, T, CG);
4 - C, G, W (i.e. C, G, AT);
5 - A, C, K (i.e. A, C, GT);
6 - G, T, M (i.e. G, T, AC);
7 - K, M (i.e. GT, AC);
8 - S, W (i.e. GC, AT);
9 - R, Y (i.e. AG, CT);
10 - A, B (i.e. A, CGT);
11 - C, D (i.e. C, AGT);
12 - G, H (i.e. G, ACT);
13 - T, V (i.e. T, ACG);
(this option is only valid for <data type> = 1)
-o <FILE> : Prefix for output files
(default: <alignment file> w/o .ext)
-n <FILE> : Only consider sequences with names listed in FILE
-p <FILE> : Specify the partitions
For <data type> = 1 - 6, partition file format:
"<partition name 1>=<start pos>-<end pos>, ..."
"<partition name 2>= ..."
Example:
part1=1-50,60-100
part2=101-200
(enumeration starts with 1)
For <data type> = 7, partition file format:
"<SN/DN/CD/10GT/14GT/AA>, <partition name 1>=<start pos>-<end pos>, ..."
"<SN/DN/CD/10GT/14GT/AA>, <partition name 2>= ..."
Example:
SN,part1=1-50,60-100
AA,part2=101-200
CD,part3=201-231
(this option cannot be used with '-s' at the same time)
-s <n1,n2> : Sliding window analysis: window size = n1; step size = n2
(this option cannot be used with '-p' at the same time)
-t <n3,n4,...> : Only output the tables n3, n4, ...
1 - C scores for individual sequences (Cr)
2 - C scores for individual sites (Cc)
3 - Distribution of C scores for individual sites (Cc)
4 - Matrix with C scores for pairs of sequences (Cij)
5 - Matrix with incompleteness scores for pairs of
sequences (Iij = 1 - Cij)
6 - Table with C score and incompleteness scores for pairs
of sequences (Cij & Iij)
(default: the program does not output any tables)
If "-t" option is used but no <n3,n4,...>, then the program
outputs all tables
-r <row|col|both>: Reorder the rows/columns (or both) of the alignment
according to the Cr/Cc scores
All the tables are displayed according to the reordered
alignment
To output the reordered alignment, please also use the
option -m
-m <n5> : Mask the alignment; 0 <= n5 <= 1
Output (1) the alignment with columns Cc >= n5 in the file
'Mask.fst', (2) the alignment with columns Cc < n5 in the
file 'Disc.fst', and (3) the alignment with an extra row in
the first line to indicate whether the column is masked in
the file 'Stat.fst'.
(Special case: if no <n5>, whole alignment is outputted in
the file 'Mask.fst'; if <n5> is 0, the alignment with
columns Cc > 0 is outputted in the file 'Mask.fst')
-i <1|2|3> : Generate heat map image for Cij scores of sequence pairs
1 - Triangular heat map
2 - Rectangular heat map
3 - Both
(if no number, then both triangular & rectangular heat map
files are outputted)
-d : Report the p distances between sequences
in the file with extension '.p-dist.csv'
(default: disabled)
Note: computation of p distances may take long time for
large number of sequences
-u : Color scheme of the heatmaps
1 - Default color scheme, suitable for color-blind persons
2 - Another color scheme
-h : This help page
================================================================================
<alignment file> 输入fasta格式序列
<data type>数据类型。
-c密码子表,这个只有输入数据类型为1(核苷酸序列)的时候才需要指定。
-b是将结果打印到屏幕,但是如果我们使用-o打印到指定文件了那就-b就无效了。
-t是输出指定类型的表格
-p是指定数据类型以及分区情况
–m( 0 ≤ ≤ 1)是根据Distribution of completeness scores for individual sites (Cc)完整性评估输出数据集,比如设置-m 0.6,则完整度大于0.6的基因序列输出至'Mask.fst',低于0.6则输出至'Disc.fst'
-i是根据Matrix with completeness scores for pairs of sequences输出热图
-d输出序列间p-distances值到'.p-dist.csv'
-o输出的文件名
–s是设置滑窗。 -s 10 5 :代表窗口大小为10,滑动距离为5
一般建议
./alistat input.fas 6 -o output -i -d -t
网友评论