Rfam简介
Rfam是Rfam是用来鉴定non-coding RNAs的数据库,常用于注释新的核酸序列或者基因组序列。Rfam:http://eddylab.org/infernal/
Rfam用户手册:http://eddylab.org/infernal/Userguide.pdf
1. 下载infernal软件
# infernal-1.1.1.tar.gz 下载软
#在你安装软件的文件中建立一个Rfam的文件
wget http://eddylab.org/software/infernal/infernal-1.1.1.tar.gz
tar xf infernal/infernal-1.1.1.tar.gz
cd infernal/infernal-1.1.1.tar.gz
./configure --prefix=`pwd`/../infernal_bin
#安装步骤
make
make install
cd easel; make install
cd ../../infernal_bin/bin
ls
#在该文件夹值就可以看到已安装的文件
export PATH=${PATH}:`pwd` #改变环境变量
2.下载数据库
wget ftp://ftp.ebi.ac.uk/pub/databases/Rfam/12.2/Rfam.cm.gz
gunzip Rfam.cm.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/Rfam/12.2/Rfam12.2.claninfo
#使用infernal中的cmpress引索Rfam.cm
../infernal_bin/bin/cmpress Rfam.cm #我的必须进入到该文件家中进行
#输出文件
Working... done.
Pressed and indexed 2588 CMs and p7 HMM filters (2588 names and 2588 accessions).
Covariance models and p7 filters pressed into binary file: Rfam.cm.i1m
SSI index for binary covariance model file: Rfam.cm.i1i
Optimized p7 filter profiles (MSV part) pressed into: Rfam.cm.i1f
Optimized p7 filter profiles (remainder) pressed into: Rfam.cm.i1p
#表示完成
3. 查询待测基因组的大小【必须】
../infernal_bin/bin/esl-seqstat ~/M.truncatula/Medtr_v4_0v1/JCVI.Medtr.v4.20130313.fasta
#输出
Format: FASTA
Alphabet type: DNA
Number of sequences: 230
Total # residues: 532015 #该行是我们需要的数字考虑到基因组为双链和下一步用到的参数的单位为Million,我们使用公式532015* 2 / 1000000计算得出结果为1.06403,作为下一步参数-Z的值.
Smallest: 202
Largest: 21302
Average length: 2313.1
运行
# Rfam12.2.claninfo 为下载的claninfo文件,需提供所在路径
# Rfam.cm 下载的cm文件
# my-genome.fa 待查询序列
# my-genome.cmscan 输出结果
# my-genome.tblout 有一个输出结果
cmscan -Z `esl-seqstat my-genome.fa | awk '{if($0~/^Total/) print int($4/2000000);}''` --cut_ga --rfam --nohmmonly --tblout my-genome.tblout --fmt 2 --clanin Rfam12.2.claninfo Rfam.cm my-genome.fa > my-genome.cmscan
#根据参考博客的博主命令如上,但是自己的运行时总是报错,出不了结果
根据官网给出的使用手册
根据使用手册运行的
~/software/infernal_bin/bin/cmscan ~/software/Rfam/Rfam.cm ../candidate_fasta/CPC_fasta/u_cpc.fasta
#
cmscan :: search sequence(s) against a CM database
# INFERNAL 1.1.1 (July 2014)
# Copyright (C) 2014 Howard Hughes Medical Institute.
# Freely distributed under the GNU General Public License (GPLv3).
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
# query sequence file: ../candidate_fasta/CPC_fasta/u_cpc.fasta
# target CM database: /root/software/Rfam/Rfam.cm
# number of worker threads: 1
# - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Query: XLOC_000318::chr1:4780155-4784203() [L=4048]
Hit scores:
rank E-value score bias modelname start end mdl trunc gc description
---- --------- ------ ----- ---------- ------ ------ --- ----- ---- -----------
------ inclusion threshold ------
(1) ? 0.14 15.6 1.0 snR73 2280 2346 + hmm - 0.30 -
(2) ? 0.17 18.2 0.2 sroH 2044 1992 - cm no 0.25 -
(3) ? 1.5 12.2 0.2 Afu_328 3042 3012 - hmm - 0.29 -
(4) ? 5.5 10.7 1.6 adapt33_1 3099 3052 - hmm - 0.23 -
(5) ? 5.8 18.7 0.0 SNORD19 3298 3375 + cm no 0.40 -
(6) ? 6.5 16.5 0.1 snoR66 2441 2506 + cm no 0.26 -
(7) ? 7.6 9.3 2.3 DLX6-AS1_2 136 241 + hmm - 0.33 -
(8) ? 9.4 23.9 0.2 KRAS_3UTR 1432 1501 + cm no 0.26 -
Hit alignments:
>> snR73
rank E-value score bias mdl mdl from mdl to seq from seq to acc trunc gc
---- --------- ------ ----- --- -------- -------- ----------- ----------- ---- ----- ----
(1) ? 0.14 15.6 1.0 hmm 1 67 [. 2280 2346 + .. 0.66 - 0.30
::::::::::::::::::::.::::::::::::::::::::::::::::::::::::::::.::::::: CS
snR73 1 GUUUAUGAUGAuUucCacUU.aUCACGACGGUCAaCUGcGuUcuUCgAuUGUUUAuuuaaG.aACuUUG 67
GUU A GAUGAuUu a+UU +UCA C GUCAaCUG+G U+u C+ UG UUA a+G +A uUU
XLOC_000318::chr1:4780155-4784203() 2280 GUUGAGGAUGAUUUUUAUUUaUUCAUAUCUGUCAACUGUGAUUUCCU--UGAUUAAACAGGuGAGUUUA 2346
5778899******6666555499*****************9988774..55555544333323333333 PP
......................
在这步,卡住了
后续再继续...............
[2019.8.20]
参考:本地使用Rfam
网友评论