史上最全的BLAST本地化教程（二）BLAST基本用法

作者: 大号在这里 | 来源:发表于2020-08-06 08:28 被阅读0次

一、BLAST+的运行

1. 建库

makeblastdb -in db.fasta -dbtype prot -out dbname

主要参数说明

-in：待格式化的序列文件
-dbtype：数据库类型，prot或nucl
-out：数据库名
更多参数说明 makeblastdb -help

2、比对

blastp -query seq.fasta -out seq.blast -db dbname -outfmt6 -evalue 1e-5 -num_threads 12 -num_alignments 1 -max_target_seqs 10

主要参数说明

-query：输入文件路径及文件名
-out：输出文件路径及文件名
-db：上一步建库的路径及数据库名
-outfmt：输出文件格式，总共有12种格式，6是tabular格式对应之前BLAST的m8格式
-evalue：设置输出结果的e-value值
-max_target_seqs ：最大对齐序列数
-num_alignments：显示对齐数据库序列的数目
-num_threads：线程数更多参数参考blastp –help

3. 核酸序列比对核酸数据库（blastn）以及核酸序列比对蛋白数据库（blastx)

blastn -query seq.fasta -out seq.blast -db dbname -outfmt 6 -evalue 1e-5 -num_threads 12
blastx -query seq.fasta -out seq.blast -db dbname -outfmt6 -evalue 1e-5 -num_threads 12

4. 文件格式

BLAST的 -outfmt选项提供个性化的选择。一共有18个选择，默认是0。

0 = Pairwise,                                                 1 = Query-anchored showing identities,
2 = Query-anchored no identities,                             3 = Flat query-anchored showing identities,
4 = Flat query-anchored no identities,                        5 = BLAST XML,
6 = Tabular,                                                  7 = Tabular with comment lines,
8 = Seqalign (Text ASN.1),                                    9 = Seqalign (Binary ASN.1),
10 = Comma-separated values,                                  11 = BLAST archive (ASN.1),
12 = Seqalign (JSON),                                         13 = Multiple-file BLAST JSON,
14 = Multiple-file BLAST XML2,                                15 = Single-file BLAST JSON,
16 = Single-file BLAST XML2,                                  17 = Sequence Alignment/Map (SAM),
18 = Organism Report

其中outfmt5（blast2go工具中用到）和outfmt6最为常用，outfmt6结果中从左到右每一列的意义分别是：

Query_id  Subject_id  %_identity  alignment_length  mismatches  gap_openings  q.start  q.end  s.start  s.end  e-value   bit_score
AKS24976.1  ABU86350.1  25.446  224 149 9  713 931 2  212 3.23e-05    38.1
AKS24976.1  ABU86150.1  38.596  57  34  1  599 655 16  71  8.09e-05    36.6
AKS24976.1  ABU86161.1  38.667  75  42  2  578 652 14  84  9.06e-05    37.0
AKS24976.1  ABU86160.1  38.667  75  42  2  578 652 14  84  9.06e-05    37.0
AKS24976.1  ABU86162.1  38.667  75  42  2  578 652 14  84  9.31e-05    37.0
AKS24976.1  ABU86154.1  38.596  57  34  1  599 655 16  71  9.70e-05    36.6
AKS24976.1  ABU86152.1  38.596  57  34  1  599 655 16  71  9.70e-05    36.6
AKS24976.1  ABU86329.1  39.130  69  38  2  599 664 83  150 2.51e-04    34.7

二、BLASTALL的运行

1. 建库

formatdb -i db.seq -p T -o T -l logfile

主要参数:

-i 输入需要格式化的源数据库名称
-p 文件类型，是核苷酸序列数据库（F - nucleotide）&蛋白质序列数据库（T -protein），default = T
-o 解析选项：解析序列标识并且建立目录[T/F]，default = F
-l 自定义log文件命令default=formatdb.log，记录运行时间、版本号、序列数目等
-n 自定义库文件命名

建库结果：

如果建立的是核酸库，输出为db.seq.nhr、db.seq.nin、db.seq.nsq三个文件，若选择了“-o T”，还会同时输出db.seq.nsd、db.seq.nsi、db.seq.nni、db.seq.nnd四个文件，一共七个。

2. 比对

blastall  -p blastp-i seq.fa -d db.fa -o blast.out -F F -m 8 -e 1e-5 -b 10 -v 10 -a 12

主要参数：

-p：所用程序名称: blastn，blastp，blastx，tblastn，tblastx
-i：所用查询序列文件
-d：所用序列数据库的名称 default=nr
-o：BLAST结果的输出文件
-F：查询序列过滤：将那些给出影响比对结果的低复杂度区域过滤掉 default = T
-m：比对结果显示格式 defalut=0
-e：期望值，描述搜索某一特定数据库时，随机出现的匹配序列数目default = 10.0
-b：显示比对结果的最大数目 default=250
-v：单行描述的最大数目 default=500
-a：使用处理器的数目 default = 1

-m 比对结果格式选项:

1=query-anchored showing identities,查询-比上区域，显示一致性

2=query-anchored no identities,查询-比上区域，不显示一致性

3=flat query-anchored,show identities,查询-比上区域的屏文形式，显示一致性

4=flat query-anchored,no identities,查询-比上区域的屏文形式，不显示一致性

5=query-anchored no identities and blunt ends,查询-比上区域，不显示一致性，无突然的结束

6=flat query-anchored,no identities and blunt ends,查询-比上区域的屏文形式，不显示一致性

7=XML Blast output,XML格式的输出

8=tabular,TAB格式的输出

9=tabularwithcomment lines,带注释行的TAB格式的输出

10=ASN,text,文本方式的ASN格式输出

11=ASN,binary[Integer]default=0,二进制方式的ASN格式输出

m8格式12列结果:

Query id,Subject id,% identity,alignment length,mismatches,gap openings,q.start,q.end,s.start,s.end,e-value,bit score

第一列为Query(递交序列)，第二列为数据库序列(目标序列subejct)，第三列为:identity

第四列为：比对长度第五列为：错配数第六列为：gap数第七列第八列为：Query开始碱基位置和结束碱基位置

第九列和第十列为：Subject开始碱基位置和结束碱基位置第十一列为：期望值第十二列为：比对得分

参考：
https://www.jianshu.com/p/de28be1a3bea
https://www.jianshu.com/p/c9ef8b79436c

网友评论

本文标题：史上最全的BLAST本地化教程（二）BLAST基本用法

本文链接：https://www.haomeiwen.com/subject/rglorktx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

史上最全的BLAST本地化教程（二）BLAST基本用法

一、BLAST+的运行

1. 建库

2、比对

3. 核酸序列比对核酸数据库（blastn）以及核酸序列比对蛋白数据库（blastx)

4. 文件格式

二、BLASTALL的运行

1. 建库

2. 比对

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读