基因组组装完成后,可通过N50或者BUSCO,以及LAI评估组装质量。本文就LAI方法做一简单介绍。
基因组中的重复序列大体分为两类:
串联重复(Tandem repeats)
散在重复(Dispersed repeats)
其中串联重复含有:简单重复序列,卫星序列等;
散在重复包括:转座子(TE,transposons,elements)
TE又可细分为两类:
DNA transposons: 由DNA介导
RNA transposons: 由RNA介导,通过RNA的反转录获得DNA,从而转移到其他基因组位置。
目前主要存在两种类型的RNA转座子:
1 LTR (long terminal repeats)双末端都是长的重复序列
2 non-LTR TEs,双末端缺乏重复序列。 LINE1和SINE(short interspersed transposable element)长/短穿插转座元件
data:image/s3,"s3://crabby-images/391e0/391e068163da3688f33fdbc3d89c36a75075cc2a" alt=""
上图中TSD表示target site duplications,红色三角表示LTR motif。A图是一个完整的LTR结构,其中a,b,c是LTR_retriever的分析目标。
LAI指数就是完整LTR反转座子序列占总LTR序列长度的比值。
安装LTR_retriever
git clone https://github.com/oushujun/LTR_retriever.git
进入paths文件,修改各个软件所在路径:
data:image/s3,"s3://crabby-images/bf96d/bf96d5da937a838c07a9e256628d8be97235cf43" alt=""
##This file will provide LTR_retriever paths to dependent programs.
##You can leave the respective paths empty if programs are accessible through ENV (i.e. exported to .bashrc)
##If you specify a path, please make sure that the required program(s) is directly contained in that path but not in any child directories.
##e.g. BLAST+=/opt/software/BLAST+/2.2.30--GCC-4.4.5/bin/
##LTR_retriever is build based on GenomeTools/1.5.4, BLAST+/2.2.28, BLAST/2.2.26, CDHIT/4.6.1c, HMMER/3.1b2, RepeatMasker/4.0.0 and Tandem Repeats Finder 4.07b
BLAST+= /public/home/fengting/miniconda3/bin/ #a path that contains makeblastdb, blastn, blastx
RepeatMasker=/public/home/fengting/miniconda3/envs/annotation/bin/ #a path that contains RepeatMasker
HMMER=/public/home/fengting/miniconda3/bin/ #a path that contains hmmsearch
CDHIT=/public/home/fengting/demo/cd-hit-v4.8.1-2019-0228/ #a path that contains cd-hit-est (preferred). CDHIT and BLAST are replaceable
BLAST=~/miniconda3/bin/ #a path that contains blastclust (optional)
安装LRT_finder:
git clone https://github.com/xzhub/LTR_Finder.git
cd LTR_Finder/source/
make
使用:
###LTR_finder 鉴定LTR序列
/public/home/fengting/demo/lai/LTR_Finder/source/ltr_finder /public/home/fengting/demo/lai/LTR_Finder/source/test/3ds_72.fa >g.scn
###LTR_retriever根据LTR_FINDER的输出识别LTR-RT,生成非冗余LTR-RT文库,可用于基因组注释
/public/home/fengting/demo/lai/LTR_retriever/LTR_retriever -threads 4 -genome test/3ds_72.fa -infinder g.scn
data:image/s3,"s3://crabby-images/8d5f4/8d5f4cc7a728de3e9a12ef6c9cf0f17a7d91b956" alt=""
data:image/s3,"s3://crabby-images/7d79d/7d79dc5fd8700d00998497170eb7fbe4da1ffa76" alt=""
结果文件.out.LAI,第二行最后一个值就是LAI值
data:image/s3,"s3://crabby-images/258db/258db368d9de94d15751dc4e901866bf7e6bf245" alt=""
网友评论