多序列比对结果可以存储为很多格式(Multiple sequence alignments can be stored in a large variety of formats.)。比如最常见的
- Fasta
>ccsA1
ATGATATTTTCAACTTTAGAGCATATAT
>ccsA2
ATGATATTTTCAACTTTAGAGCATATAT
>ccsA3
ATGATATTTTCAACTTTAGAGCATATAT
>ccsA4
ATGATATTTTCAACTTTAGAGCATATAT
- clustal
CLUSTAL W (1.8) multiple sequence alignment (ALTER 1.3.3)
ccsA1 ATGATATTTTCAACTTTAGAGCATATAT
ccsA2 ATGATATTTTCAACTTTAGAGCATATAT
ccsA3 ATGATATTTTCAACTTTAGAGCATATAT
ccsA4 ATGATATTTTCAACTTTAGAGCATATAT
****************************
- NEXUS
#NEXUS
BEGIN DATA;
dimensions ntax=4 nchar=28;
format missing=?
interleave=yes datatype=DNA gap=- match=.;
matrix
ccsA1 ATGATATTTTCAACTTTAGAGCATATAT
ccsA2 ATGATATTTTCAACTTTAGAGCATATAT
ccsA3 ATGATATTTTCAACTTTAGAGCATATAT
ccsA4 ATGATATTTTCAACTTTAGAGCATATAT
;
end;
- PHYLIP
4 28
ccsA1 atgatatttt caactttaga gcatatat
ccsA2 atgatatttt caactttaga gcatatat
ccsA3 atgatatttt caactttaga gcatatat
ccsA4 atgatatttt caactttaga gcatatat
- MEGA
#mega
TITLE: MSA converted with ALTER 1.3.3
#ccsA1 ATGATATTTT CAACTTTAGA GCATATAT
#ccsA2 ATGATATTTT CAACTTTAGA GCATATAT
#ccsA3 ATGATATTTT CAACTTTAGA GCATATAT
#ccsA4 ATGATATTTT CAACTTTAGA GCATATAT
不同的比对软件会输出不一样的比对格式;比对后分析用到的软件对输入格式的要求也不一样。比如序列比对我习惯使用MAFFT。MAFFT输出结果默认为fasta格式,clustal可选;如果后续需要使用MrBayes构建贝叶斯树,需要将其转化为NEXUS格式。这里推荐 ALTER来完成比对格式转化的任务。如果分析的序列不是很多,可以选择网页版;如果序列条数比较多可以选择安装本地版 https://github.com/sing-group/ALTER;按照安装步骤执行即可,自己的安装过程没有遇到报错;
安装步骤
git clone https://github.com/sing-group/ALTER.git
cd ALTER
mvn package
依赖
Git tool for cloning the last version
A Java Compiler and tool
The Maven tool
以上依赖软件都可以通过conda安装;关于conda的安装教程可以微信搜索教程价值999的全外显子教学视频--免费送
- 安装好以后执行
java -jar alter-lib/target/ALTER-1.3.4-jar-with-dependencies.jar help
# 输出结果
No argument is allowed: help
-c (--collapse) : Collapse sequences to haplotypes.
-cg (--collapseGaps) : Treat gaps as missing data when collapsing.
-cl (--collapseLimit) N : Connection limit (sequences differing at <= l si
tes will be collapsed) (default is l=0).
-cm (--collapseMissing) : Count missing data as differences when collapsin
g.
-i (--input) FILE : Input file.
-ia (--inputAutodetect) : Autodetect format (other input options are omitt
ed).
-if (--inputFormat) VAL : Input format (ALN, FASTA, GDE, MEGA, MSF, NEXUS,
PHYLIP or PIR).
-io (--inputOS) VAL : Input operating system (Linux, MacOS or Windows)
.
-ip (--inputProgram) VAL : Input program (Clustal, MAFFT, MUSCLE, PROBCONS
or TCoffee).
-o (--output) FILE : Output file.
-of (--outputFormat) VAL : Output format (ALN, FASTA, GDE, MEGA, MSF, NEXUS
, PHYLIP or PIR).
-ol (--outputLowerCase) : Lowe case output.
-om (--outputMatch) : Output match characters.
-on (--outputResidueNumbers) : Output residue numbers (only ALN format).
-oo (--outputOS) VAL : Output operating system (Linux, MacOS or Windows
).
-op (--outputProgram) VAL : Output program (jModelTest, MrBayes, PAML, PAUP,
PhyML, ProtTest, RAxML, TCS, CodABC, BioEdit, M
EGA, dnaSP, Se-Al, Mesquite, SplitsTree, Clustal
, MAFFT, MUSCLE, PROBCONS, TCoffee, Gblocks, Sea
View, trimAl or GENERAL)
-os (--outputSequential) : Sequential output (only NEXUS and PHYLIP formats
).
我自己将fasta格式转化为NEXUX格式
java -jar alter-lib/target/ALTER-1.3.4-jar-with-dependencies.jar -i ~/mingyan/practice_assorted/Myrtales_CP_genomes/another/Myrtales_cp_genome_aligned.fasta-gb -ia -o ./output.nex -of NEXUS -op MrBayes -oo Linux
# 运行结果
<INFO> : FASTA format detected.
<INFO> : MSA read in FASTA format (Taxa = 90, Length = 106571).
<INFO> : Nucleotide MSA type inferred.
<INFO> : MSA successfully converted to NEXUS format!
小工具对应的论文
ALTER: program-oriented conversion of DNA and protein alignments
期刊
Nucleic Acids Research
2010年
欢迎大家关注我的微信公众号 小明的数据分析笔记本
网友评论