美文网首页
scRNA Editing Project1-SPRINT的安装

scRNA Editing Project1-SPRINT的安装

作者: 江湾青年 | 来源:发表于2021-05-19 18:50 被阅读0次

    介绍

    SPRINT是Zhang等人2017年发表在Bioinformatics上的检测RNA编辑位点的工具,文章题目为:SPRINT: an SNP-free toolkit for identifying RNA editing sites。该工具不同于传统的RES(RNA Editing Sites)检测方法,它不依赖于数据库中的SNP位点。

    SNP-free RNA editing Identification Toolkit (SPRINT)

    简单来说,因为RNA编辑通常是成簇发生的,因此SPRINT定义一个SNV duplet的概念:如果基因组上两个相邻的SNV位点小于一定的阈值的话,则称之为一个SNV duplet,将这两个SNV位点定义为RES。基因组上不同区域的duplet阈值可以有不同的取值(例如Alu区域倾向于发生更多的RNA编辑,则Alu区域的该阈值设置为更小)。


    SPRINT文章解读

    引言

    RNA编辑主要分为A-I和C-U两种,其中人类组织中发生的RNA编辑的95%是A-I。

    传统对RES检测的方法是首先将RNA-Seq数据与参考基因组或参考转录组相比较,找出所有的SNV(Single Nucleotide Variants),然后再将基因组中本来存在的SNP位点过滤掉,剩下的就是RES位点。

    A-to-I RES位点被发现在基因组上是成簇出现的,而SNP在基因组上则是密度很低,并且不同的SNP在基因组上的出现也是独立的。因此,定义两个相邻的相同变异类型的SNV为SNV duplet,通过SNV duplet的不同分布来区分SNP和RES。

    通过SNV duplet来识别RES

    此外,对于未比对到基因组上的resds,Porath等人通过将A全部替换为G,然后再与参考基因组比对,可以发现基因组的某些区域上存在大量的RNA编辑,这种现象称为RNA超编辑。利用这种方法,SPRINT也能检测出hyper-RES位点。


    方法

    具体来讲,SPRINT的流程如下:

    SPRINT流程示意图

    SPRINT的安装

    SPRINT v0.1.8最新版的安装过程非常简单,首先在https://github.com/jumphone/SPRINT下载源数据包,然后在python2.7的环境下使用pip命令即可安装完成

    pip install SPRINT-master.zip


    SPRINT的使用

    Prepare: Mask reference genome and build mapping index

    sprint prepare [options] reference_genome(.fa) bwa_path

    [options]:

    -t transcript_annotation(.gtf)         #Optional


    Main: Identify regular- and hyper- RESs

    sprint main [options] reference_genome(.fa) output_path bwa_path samtools_path

    [options]:

    -1 read1(.fq)         # Required !

    -2 read2(.fq)         # Optional

    -rp repeat_file         # Optional, you can http://sprint.software/SPRINT/dbrep/

    -ss INT         # when input is strand-specific sequencing data, please clarify the direction of read1. [0 for antisense; 1 for sense] (default is 0)

    -c INT         # Remove the fist INT bp of each read (default is 0)

    -p INT         # Mapping CPU (default is 1)

    -cd INT         # The distance cutoff of SNV duplets (default is 200)

    -csad1 INT         # Regular - [-rp is required] cluster size - Alu - AD >=1 (default is 3)

    -csad2 INT         # Regular - [-rp is required] cluster size - Alu - AD >=2 (default is 2)

    -csnar INT         # Regular - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnr INT # Regular - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 7) -csrg INT # Regular - [without -rp] cluster size - AD >=1 (default is 5)

    -csahp INT         # Hyper - [-rp is required] cluster size - Alu - AD >=1 (default is 5)

    -csnarhp INT         # Hyper - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnrhp INT # Hyper - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 5)

    -cshp INT         # Hyper - [without -rp] cluster size - AD >=1 (default is 5)


    Start from aligned reads

    对于已经比对好后得到的BAM文件,可以使用sprint_from_bam命令寻找RES。但仅通过BAM文件无法找到hyper RES,因为hyper RES需要使用比对软件得到unmapped reads。要得到hyper RES,可以先使用samtools将unmapped reads从BAM文件中提取出来,然后转换为fastq格式,再对这些unmapped reads执行前两步的sprint标准流程即可。

    sprint_from_bam [options] alinged_reads(.bam) reference_genome(.fa) output_path samtools_path

    [options]:

    -rp repeat_file         # Optional, you can download it from http://sprint.software/SPRINT/dbrep/

    -cd INT         # The distance cutoff of SNV duplets (default is 200)

    -csad1 INT         # Regular - [-rp is required] cluster size - Alu - AD >=1 (default is 3)

    -csad2 INT         # Regular - [-rp is required] cluster size - Alu - AD >=2 (default is 2)

    -csnar INT         # Regular - [-rp is required] cluster size - nonAlu Repeat - AD >=1 (default is 5) -csnr INT # Regular - [-rp is required] cluster size - nonRepeat - AD >=1 (default is 7) -csrg INT # Regular - [without -rp] cluster size - AD >=1 (default is 5)


    实战

    cd /local/txm/txmdata/scRNA_editing/SRRdata/SRR7311317/sprinttest/

    sprint prepare -t ./Homo_sapiens.GRCh38.87.chr.gtf ./hg38.fa /local/txm/anaconda3/envs/py2/bin/bwa

    sprint main -rp  ./hg38_repeat.bed  -p  8  -1  ../SRR7311317_1.fastq  -2  ../SRR7311317_2.fastq  ./hg38.fa  ./  /local/txm/anaconda3/envs/py2/bin/bwa  /local/txm/txmdata/scRNA_editing/SPRINT-master/samtools_and_bwa/samtools


    参考

    https://academic.oup.com/bioinformatics/article/33/22/3538/4004872

    https://github.com/jumphone/SPRINT

    https://github.com/jumphone/SPRINT/blob/master/SPRINT_manual.pdf

    相关文章

      网友评论

          本文标题:scRNA Editing Project1-SPRINT的安装

          本文链接:https://www.haomeiwen.com/subject/cbhsjltx.html