方法参考De Novo Transcriptome Assembly and Analysis of the Flat Oyster Pathogenic Protozoa Bonamia Ostreae这篇文章中对转录组内容的描述。
Transcriptome Taxonomic Assignment
Taxonomic assigment of the whole collection of the de novo-assembled transcripts was performed with two different tools: (i) with MEGAN version 6.20.18 (Huson et al., 2016) after a DIAMOND-blastx (version 2.0.6) interrogation of the mRNA transcripts with parameters: –sensitive –max-target-seqs 100 –e value 1e-3 -F 15 –range-culling against the RefSeq protein database, accessed in February 2021 (Buchfink et al., 2021); and (ii) with Blobtool2 implemented in BlobToolKit version 1.3.6 (Challis et al., 2020) after a BLASTn (version 2.9.0) interrogation of the mRNA transcripts against the nt database accessed in February 2021 with the following parameters: -max_hsps 1 -max_target_seqs 10 -evalue 1e-25 and a DIAMOND-blastx interrogation of the mRNA transcripts against the RefSeq protein database with parameters: –sensitive –max-target-seqs 1 –evalue 1e-25 (version 2.0.6, accessed in February 2021).
Taxonomic assignment of transcripts performed with BlobToolKit after a DIAMOND-BlastX interrogation of the RefSeq database and a BlastN interrogation of the nt database. A blob plot of length versus GC proportion for each transcript was created. Records are colored by phylum. Circles are sized in proportion to TPM on a square-root scale. Histograms show the distribution of TPM sum along each axis.
一 本地构建diamond库
需要nr数据,prot.accession2taxid.gz,taxdump/nodes.dmp(可根据自己物种建库,需nr数据)
diamond makedb --in db_nr/nr_plants/plants.nr.fa.gz --db db_nr/db_nr_plants/nr_plants --taxonmap db_nr/prot.accession2taxid.gz --taxonnodes db_nr/taxdump/nodes.dmp
//总的来说,建库文件约为fa文件的两倍大
二 比对
根据blobtools官网修改命令。
diamond blastx \
--query assembly.fasta \
--db /path/to/uniprot.db.with.taxids \
--outfmt 6 qseqid staxids bitscore qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore \
--sensitive \
--max-target-seqs 1 \
--evalue 1e-25 \
--threads 16 \
> diamond.out
//修改如下
diamond blastx --query rna/Trinity.fasta --db db_nr_plants/nr_plants1.dmnd --outfmt 6 qseqid staxids bitscore qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore --sensitive --max-target-seqs 1 --evalue 1e-25 --threads 16 -o diamond_blast_hits/plants_nr
Total time = 37934.5s
Reported 29673 pairwise alignments, 29673 HSPs.
29673 queries aligned.
三 文件夹和添加数据
blobtools create \
--fasta /path/to/assembly.fasta \
--taxid 13804 \
--taxdump ~/taxdump \
AssemblyName
blobtools add \
--hits blast.out \
--hits diamond.out \
--taxrule bestsumorder \
--taxdump ~/taxdump \
AssemblyName
blobtools add \
//实际代码
blobtools create --fasta rna/Trinity_HB.fasta blobtools_dir/Trinity_blob
blobtools add --hits diamond_blast_hits/plants_nr --taxdump taxdump --taxrule bestsumorder blobtools_dir/Trinity_blob
blobtools host `pwd`
//运行后打开本地浏览器,可进行可视化绘图
问题解决
#blobtools安装时出现多重问题,最终重新安装wsl ubuntu22.04,conda新建python3.9的环境bkb,并且使用pip安装gcc
sudo apt-get update
sudo apt-get install build-essential
//安装blobtools2 ,重启conda环境后可使用
blobtools
BlobTools2 - assembly exploration, QC and filtering.
usage: blobtools [<command>] [<args>...] [-h|--help] [--version]
##使用diamond建库遇到问题,Accession exceeds supported length
查验后发现是prot.accession2taxid.gz的问题,且diamond版本太旧,使用conda重新安装diamond
conda create -n diamond1
conda activate diamond1
conda config --add channels conda-forge
conda install libgcc-ng=12 //gcc依赖版本太低,需要升级gcc
conda install -c bioconda diamond=2.1.8
![](https://img.haomeiwen.com/i14760287/8c26713d2aa78932.png)
网友评论