介绍
SPRINT是Zhang等人2017年发表在Bioinformatics上的检测RNA编辑位点的工具,文章题目为:SPRINT: an SNP-free toolkit for identifying RNA editing sites。该工具不同于传统的RES(RNA Editing Sites)检测方法,它不依赖于数据库中的SNP位点。
SNP-free RNA editing Identification Toolkit (SPRINT)简单来说,因为RNA编辑通常是成簇发生的,因此SPRINT定义一个SNV duplet的概念:如果基因组上两个相邻的SNV位点小于一定的阈值的话,则称之为一个SNV duplet,将这两个SNV位点定义为RES。基因组上不同区域的duplet阈值可以有不同的取值(例如Alu区域倾向于发生更多的RNA编辑,则Alu区域的该阈值设置为更小)。
SPRINT文章解读
引言
RNA编辑主要分为A-I和C-U两种,其中人类组织中发生的RNA编辑的95%是A-I。
传统对RES检测的方法是首先将RNA-Seq数据与参考基因组或参考转录组相比较,找出所有的SNV(Single Nucleotide Variants),然后再将基因组中本来存在的SNP位点过滤掉,剩下的就是RES位点。
A-to-I RES位点被发现在基因组上是成簇出现的,而SNP在基因组上则是密度很低,并且不同的SNP在基因组上的出现也是独立的。因此,定义两个相邻的相同变异类型的SNV为SNV duplet,通过SNV duplet的不同分布来区分SNP和RES。
通过SNV duplet来识别RES此外,对于未比对到基因组上的resds,Porath等人通过将A全部替换为G,然后再与参考基因组比对,可以发现基因组的某些区域上存在大量的RNA编辑,这种现象称为RNA超编辑。利用这种方法,SPRINT也能检测出hyper-RES位点。
方法
具体来讲,SPRINT的流程如下:
SPRINT流程示意图SPRINT的安装
SPRINT v0.1.8最新版的安装过程非常简单,首先在https://github.com/jumphone/SPRINT下载源数据包,然后在python2.7的环境下使用pip命令即可安装完成
pip install SPRINT-master.zip
SPRINT的使用
Prepare: Mask reference genome and build mapping index
sprint prepare [options] reference_genome(.fa) bwa_path
[options]:
-t transcript_annotation(.gtf) #Optional
Main: Identify regular- and hyper- RESs
sprint main [options] reference_genome(.fa) output_path bwa_path samtools_path
[options]:
-1 read1(.fq) # Required !
-2 read2(.fq) # Optional
-rp repeat_file # Optional, you can http://sprint.software/SPRINT/dbrep/
-ss INT # when input is strand-specific sequencing data, please clarify the direction of read1. [0 for antisense; 1 for sense] (default is 0)
-c INT # Remove the fist INT bp of each read (default is 0)
-p INT # Mapping CPU (default is 1)
-cd INT # The distance cutoff of SNV duplets (default is 200)
-csad1 INT # Regular - [-rp is required] cluster size - Alu - AD >=1 (default is 3)
-csad2 INT # Regular - [-rp is required] cluster size - Alu - AD >=2 (default is 2)
-csnar INT # Regular - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnr INT # Regular - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 7) -csrg INT # Regular - [without -rp] cluster size - AD >=1 (default is 5)
-csahp INT # Hyper - [-rp is required] cluster size - Alu - AD >=1 (default is 5)
-csnarhp INT # Hyper - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnrhp INT # Hyper - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 5)
-cshp INT # Hyper - [without -rp] cluster size - AD >=1 (default is 5)
Start from aligned reads
对于已经比对好后得到的BAM文件,可以使用sprint_from_bam命令寻找RES。但仅通过BAM文件无法找到hyper RES,因为hyper RES需要使用比对软件得到unmapped reads。要得到hyper RES,可以先使用samtools将unmapped reads从BAM文件中提取出来,然后转换为fastq格式,再对这些unmapped reads执行前两步的sprint标准流程即可。
sprint_from_bam [options] alinged_reads(.bam) reference_genome(.fa) output_path samtools_path
[options]:
-rp repeat_file # Optional, you can download it from http://sprint.software/SPRINT/dbrep/
-cd INT # The distance cutoff of SNV duplets (default is 200)
-csad1 INT # Regular - [-rp is required] cluster size - Alu - AD >=1 (default is 3)
-csad2 INT # Regular - [-rp is required] cluster size - Alu - AD >=2 (default is 2)
-csnar INT # Regular - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnr INT # Regular - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 7) -csrg INT # Regular - [without -rp] cluster size - AD >=1 (default is 5)
实战
cd /local/txm/txmdata/scRNA_editing/SRRdata/SRR7311317/sprinttest/
sprint prepare -t ./Homo_sapiens.GRCh38.87.chr.gtf ./hg38.fa /local/txm/anaconda3/envs/py2/bin/bwa
sprint main -rp ./hg38_repeat.bed -p 8 -1 ../SRR7311317_1.fastq -2 ../SRR7311317_2.fastq ./hg38.fa ./ /local/txm/anaconda3/envs/py2/bin/bwa /local/txm/txmdata/scRNA_editing/SPRINT-master/samtools_and_bwa/samtools
参考
https://academic.oup.com/bioinformatics/article/33/22/3538/4004872
https://github.com/jumphone/SPRINT
https://github.com/jumphone/SPRINT/blob/master/SPRINT_manual.pdf
网友评论