NCBI Blast是常用的序列查找工具, 包括蛋白, 核酸. 一般使用网页进行查询即可, 但有时候开发则需要本地的数据库以及程序. NCBI提供Blast+工具包, 内含多种blast工具, 介绍可以参考NCBI提供的两份文档(书):
下载与安装
Blast+的下载
- Blast+程序下载: ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
- 根据平台进行选择安装, 例如mac的
dmg
版本, win的-win64.exe
. - Mac安装后再
/usr/local/ncbi/blast
- 安装目录内含有两个子文件夹,
bin
与doc
.bin
内有可执行程序, 介绍如下:
Program | Function |
---|---|
blastdbcheck | Checks the integrity of a BLAST database |
blastdbcmd | Retrieves sequences or other information from a BLAST database |
blastdb_aliastool | Creates database alias (to tie volumes together for example) |
Blastn |
Searches a nucleotide query against a nucleotide database |
blastp |
Searches a protein query against a protein database |
blastx |
Searches a nucleotide query, dynamically translated in all six frames, against a protein database |
blast_formatter | Formats a blast result using its assigned request ID (RID) or its saved archive |
convert2blastmask | Converts lowercase masking into makeblastdb readable data |
deltablast | Searches a protein query against a protein database, using a more sensitive algorithm |
dustmasker | Masks the low complexity regions in the input nucleotide sequences |
legacy_blast.pl | Converts a legacy blast search command line into blast+ counterpart and execute it |
makeblastdb | Formats input FASTA file(s) into a BLAST database |
makembindex | Indexes an existing nucleotide database for use with megablast |
makeprofiledb | Creates a conserved domain database from a list of input position specific scoring matrix (scoremats) generated by psiblast |
psiblast | Finds members of a protein family, identifies proteins distantly related to the query, or builds position specific scoring matrix for the query |
rpsblast | Searches a protein against a conserved domain database to identify functional domains present in the query |
rpstblastn | Searches a nucleotide query, by dynamically translating it in all six-frames first, against a conserved domain database |
segmasker | Masks the low complexity regions in input protein sequences |
tblastn | Searches a protein query against a nucleotide database dynamically translated in all six frames |
tblastx | Searches a nucleotide query, dynamically translated in all six frames, against a nucleotide database similarly translated |
update_blastdb.pl |
Downloads preformatted blast databases from NCBI |
windowmasker | Masks repeats found in input nucleotide sequences |
executables
除了提供 Blast+, 还提供其他工具:
-
magic-blast
: 用于映射大的next-generation RNA和DNA序列到全基因组或转录组的. 可参考Magic-Blast -
IgBlast
: 分析免疫球蛋白和T细胞受体可变区域序列. 可参考IgBlast 和相关文献. -
rmblast
: -
remote-fuser
:
配置
- 将BLAST按照目录export到PATH, 例如
export PATH=$PATH:$HOME/ncbi-blast-2.8.1+/bin
. 这可保证直接执行. - 管理数据库:
- 创建一个存放数据库的文件夹:
mkdir $HOME/ncbi-blast-2.8.1+/blastdb
- 设置
BLASTDB
环境变量,export BLASTDB=$HOME/blastdb
- 自行下载和解压相关序列数据库
- 使用
updata_blastdb.pl
来管理数据库.
数据库的下载
NCBI FTP服务器提供一个BLAST的专门文件夹 : ftp://ftp.ncbi.nlm.nih.gov/blast/, 含有BLAST程序以及数据库. 内含以下子文件夹:
-
db
: 数据库, 很重要 -
executables
: 可执行程序, 包括Blast+ -
documents
: 文档 -
demo
: 各种提供给开发者的demonstration packages -
matrices
: Different supported and experimental scoring matrices -
WGS_TOOLS
: 产生WGS计划数据库的工具 -
temp
: 杂项文件 -
windowmasker_files
: A collection of windowmasker files for various organisms/genomes, each in its own subdirectory named using their taxonomic ids
配置
可执行文件路径加入到环境变量. 将blast内bin的文件夹路径加入到PATH
环境变量即可, 请自行搜索具体方法. 例如Bash: export PATH=$PATH:/usr/local/ncbi/blast/bin
另外一个重要的配置是BLASTDB
环境变量, 即blast进行搜索时数据库所在. 根据数据库位置进行设置, 例如 : export BLASTDB=$HOME/blastdb
示例
官方简单示例1
- 使用
blastdbcmd
提取已安装数据库(refseq_rna.00
)中的nm_000122
序列到文档test_query.txt
- 运行
blastn
进行核酸的搜索, 也是搜索本地该数据库.
$ blastdbcmd -db refseq_rna.00 -entry nm_000122 -out test_query.fa
$ blastn -query test_query.fa -db refseq_rna.00 -task blastn -dust no -outfmt "7 qseqid sseqid evalue bitscore" -max_target_seqs 2
# BLASTN 2.2.29+
# Query: gi|263191547|ref|NM_000122.3| Homo sapiens mutL homolog 1 (MLH1), transcript variant 1, mRNA
# Database: refseq_rna.00
# Fields: query id, subject id, evalue, bit score
# 2 hits found
gi|263191547|ref|NM_000122.3| gi|263191547|ref|NM_000122.3| 0.0 4801
gi|263191547|ref|NM_000122.3| gi|332816398|ref|XM_001170433.2| 0.0 4758
# BLAST processed 1 queries
网友评论