介绍

SPRINT是Zhang等人2017年发表在Bioinformatics上的检测RNA编辑位点的工具，文章题目为：SPRINT: an SNP-free toolkit for identifying RNA editing sites。该工具不同于传统的RES（RNA Editing Sites）检测方法，它不依赖于数据库中的SNP位点。

SNP-free RNA editing Identification Toolkit (SPRINT)

简单来说，因为RNA编辑通常是成簇发生的，因此SPRINT定义一个SNV duplet的概念：如果基因组上两个相邻的SNV位点小于一定的阈值的话，则称之为一个SNV duplet，将这两个SNV位点定义为RES。基因组上不同区域的duplet阈值可以有不同的取值（例如Alu区域倾向于发生更多的RNA编辑，则Alu区域的该阈值设置为更小）。

SPRINT文章解读

引言

RNA编辑主要分为A-I和C-U两种，其中人类组织中发生的RNA编辑的95%是A-I。

传统对RES检测的方法是首先将RNA-Seq数据与参考基因组或参考转录组相比较，找出所有的SNV（Single Nucleotide Variants）,然后再将基因组中本来存在的SNP位点过滤掉，剩下的就是RES位点。

A-to-I RES位点被发现在基因组上是成簇出现的，而SNP在基因组上则是密度很低，并且不同的SNP在基因组上的出现也是独立的。因此，定义两个相邻的相同变异类型的SNV为SNV duplet，通过SNV duplet的不同分布来区分SNP和RES。

通过SNV duplet来识别RES

此外，对于未比对到基因组上的resds，Porath等人通过将A全部替换为G，然后再与参考基因组比对，可以发现基因组的某些区域上存在大量的RNA编辑，这种现象称为RNA超编辑。利用这种方法，SPRINT也能检测出hyper-RES位点。

方法

具体来讲，SPRINT的流程如下：

SPRINT流程示意图

SPRINT的安装

SPRINT v0.1.8最新版的安装过程非常简单，首先在https://github.com/jumphone/SPRINT下载源数据包，然后在python2.7的环境下使用pip命令即可安装完成

pip install SPRINT-master.zip

SPRINT的使用

Prepare: Mask reference genome and build mapping index

sprint prepare [options] reference_genome(.fa) bwa_path

[options]:

-t transcript_annotation(.gtf) #Optional

Main: Identify regular- and hyper- RESs

sprint main [options] reference_genome(.fa) output_path bwa_path samtools_path

[options]:

-1 read1(.fq)         # Required !

-2 read2(.fq)         # Optional

-rp repeat_file         # Optional, you can http://sprint.software/SPRINT/dbrep/

-ss INT         # when input is strand-specific sequencing data, please clarify the direction of read1. [0 for antisense; 1 for sense] (default is 0)

-c INT         # Remove the fist INT bp of each read (default is 0)

-p INT         # Mapping CPU (default is 1)

-cd INT         # The distance cutoff of SNV duplets (default is 200)

-csad1 INT         # Regular - [-rp is required] cluster size - Alu - AD >=1 (default is 3)

-csad2 INT         # Regular - [-rp is required] cluster size - Alu - AD >=2 (default is 2)

-csnar INT         # Regular - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnr INT # Regular - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 7) -csrg INT # Regular - [without -rp] cluster size - AD >=1 (default is 5)

-csahp INT         # Hyper - [-rp is required] cluster size - Alu - AD >=1 (default is 5)

-csnarhp INT         # Hyper - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnrhp INT # Hyper - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 5)

-cshp INT         # Hyper - [without -rp] cluster size - AD >=1 (default is 5)

Start from aligned reads

对于已经比对好后得到的BAM文件，可以使用sprint_from_bam命令寻找RES。但仅通过BAM文件无法找到hyper RES，因为hyper RES需要使用比对软件得到unmapped reads。要得到hyper RES，可以先使用samtools将unmapped reads从BAM文件中提取出来，然后转换为fastq格式，再对这些unmapped reads执行前两步的sprint标准流程即可。

sprint_from_bam [options] alinged_reads(.bam) reference_genome(.fa) output_path samtools_path

[options]:

-rp repeat_file         # Optional, you can download it from http://sprint.software/SPRINT/dbrep/

-cd INT         # The distance cutoff of SNV duplets (default is 200)

-csad1 INT         # Regular - [-rp is required] cluster size - Alu - AD >=1 (default is 3)

-csad2 INT         # Regular - [-rp is required] cluster size - Alu - AD >=2 (default is 2)

-csnar INT         # Regular - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnr INT # Regular - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 7) -csrg INT # Regular - [without -rp] cluster size - AD >=1 (default is 5)

实战

cd /local/txm/txmdata/scRNA_editing/SRRdata/SRR7311317/sprinttest/

sprint prepare -t ./Homo_sapiens.GRCh38.87.chr.gtf ./hg38.fa /local/txm/anaconda3/envs/py2/bin/bwa

sprint main -rp ./hg38_repeat.bed -p 8 -1 ../SRR7311317_1.fastq -2 ../SRR7311317_2.fastq ./hg38.fa ./ /local/txm/anaconda3/envs/py2/bin/bwa /local/txm/txmdata/scRNA_editing/SPRINT-master/samtools_and_bwa/samtools