Tools
image.pngblastn
-outfmts 6 default values
-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore"
it seemed that pident equals to nident /length.
blastn supported parameters
qseqid Query Seq-id
qgi Query GI
qacc Query accesion
qaccver Query accesion.version
qlen Query sequence length
sseqid Subject Seq-id
sallseqid All subject Seq-id(s), separated by a ';'
sgi Subject GI
sallgi All subject GIs
sacc Subject accession
saccver Subject accession.version
sallacc All subject accessions
slen Subject sequence length
qstart Start of alignment in query
qend End of alignment in query
sstart Start of alignment in subject
send End of alignment in subject
qseq Aligned part of query sequence
sseq Aligned part of subject sequence
evalue Expect value
bitscore Bit score
score Raw score
length Alignment length
pident Percentage of identical matches
nident Number of identical matches
mismatch Number of mismatches
positive Number of positive-scoring matches
gapopen Number of gap openings
gaps Total number of gaps
ppos Percentage of positive-scoring matches
frames Query and subject frames separated by a '/'
qframe Query frame
sframe Subject frame
btop Blast traceback operations (BTOP)
staxids Subject Taxonomy ID(s), separated by a ';'
sscinames Subject Scientific Name(s), separated by a ';'
scomnames Subject Common Name(s), separated by a ';'
sblastnames Subject Blast Name(s), separated by a ';' (in alphabetical order)
sskingdoms Subject Super Kingdom(s), separated by a ';' (in alphabetical order)
stitle Subject Title
salltitles All Subject Title(s), separated by a '<>'
sstrand Subject Strand
qcovs Query Coverage Per Subject
qcovhsp Query Coverage Per HSP
example
blastn -db Sub_database -query query.fa -num_threads 10 -evalue 1e-6 \
-outfmt '6 qseqid sseqid pident nident qlen slen evalue bitscore' -out query_blastn.txt
word_size
Changing the initial word-size can help to find more, but less accurate hits; or to limit the results to almost perfect hits.
-
Decreasing the word-size will increase the number of detected homologous sequences, but hits can include alignments of higher fragmentation due to gaps and substitutions (example: search for homologous genes between distant species, see also: -task blastn)
-
Increasing the word-size will give less hits as it requires a longer continuous regions of exact match. If the word-size is chosen to be almost the size of the query, BLAST will search for almost exact matches (example: search for location of gene sequences in the original genome of the gene)
For short sequences, word-size must be less than half the query length, otherwise reliable hits can be missed.
E-value & Bit-score
- The smaller the E-value, the better the match.
- The higher the bit-score, the better the sequence similarity
descriptions
if one is interested in the descriptions of the matched sequences in particular - useful not just for human interpretation but handy to search for keywords such as an enzyme or organism name. the stitle
will add the titles/descriptions for the matchs.
~/miniconda3/pkgs/blast-2.10.1-pl526he19e7b1_1/bin/blastp -num_threads 10 \
-evalue 1e-5 -outfmt '6 qseqid sseqid pident nident qlen slen evalue stitle' \
-db ath.db -query pep.fa -out blast.out -subject_besthit
reference(copied from)
https://www.metagenomics.wiki/tools/blast/default-word-size
https://www.metagenomics.wiki/tools/blast/blastn-output-format-6
https://www.metagenomics.wiki/tools/blast/evalue
https://www.ncbi.nlm.nih.gov/books/NBK279684/#appendices.Options_for_the_commandline_a
The Biostar Handbook: 2nd Edition
网友评论