美文网首页基因小工具
RSeQC判断链特异性(strand-specific)

RSeQC判断链特异性(strand-specific)

作者: 生信编程日常 | 来源:发表于2020-07-24 15:30 被阅读0次

    对于strand-specific的RNA-seq而言,我们必须得知道它是哪一种建库方式,才能进行后续的定量分析。

    stringtie:

    --rf    Assumes a stranded library fr-firststrand.
    --fr    Assumes a stranded library fr-secondstrand.
    

    kallisto:

    --fr-stranded runs kallisto in strand specific mode, only fragments where the first read in the pair pseudoaligns to the forward strand of a transcript are processed. If a fragment pseudoaligns to multiple transcripts, only the transcripts that are consistent with the first read are kept.
    
    --rf-stranded same as --fr-stranded but the first read maps to the reverse strand of a transcript.
    

    现在比较常用的方式是fr-firststrand,也就是基于d-UTP的建库方式。但是为了更稳妥的判断,我们可以使用RSeQC中的工具来判断。RSeQC是2012年发表在Bioinformatics上的一个工具,包含多种功能:

    1. 安装
    # pip安装
    pip3 install RSeQC
    
    # 源代码安装
    tar zxf RSeQC-VERSION.tar.gz
    
    cd RSeQC-VERSION
    
    #type 'python setup.py install --help' to see options
    python setup.py install        #Note this requires root privilege
    or
    python setup.py install --root=/home/user/XXX/         #install RSeQC to user specificed location, does NOT require root privilege
    
    #This is only an example. Change path according to your system configuration
    export PYTHONPATH=/home/user/lib/python2.7/site-packages:$PYTHONPATH
    
    #This is only an example. Change path according to your system configuration
    export PATH=/home/user/bin:$PATH
    
    2. infer_experiment.py

    单端数据:

    infer_experiment.py -r hg19.refseq.bed12 -i SingleEnd_StrandSpecific_36mer_Human_hg19.bam
    
    #Output:
    This is SingleEnd Data
    Fraction of reads failed to determine: 0.0170
    Fraction of reads explained by "++,--": 0.9669
    Fraction of reads explained by "+-,-+": 0.0161
    

    "++,--" 的比例远远超过另一种,这是strand-specifc的数据。++,--就是指的测出来的正链即实际的正链,负链就是实际的负链。


    如上图这种,就是非链特异性的单端数据。

    如果两种接近1:1,则是非链特异性,而假如两种比例悬殊,则是链特异性。

    双端数据:

    infer_experiment.py -r hg19.refseq.bed12 -i Pairend_StrandSpecific_51mer_Human_hg19.bam
    
    #Output::
    
    This is PairEnd Data
    Fraction of reads failed to determine: 0.0072
    Fraction of reads explained by "1++,1--,2+-,2-+": 0.9441
    Fraction of reads explained by "1+-,1-+,2++,2--": 0.0487
    

    这种显然是链特异性,而且是fr-secondstrand。意思就是read1在+链,相对的gene也同样在+链上,而read2在+链,相对的gene在-链上。这种就是kallisto中的--fr-stranded和stringtie中的--fr。

    现在这种特异性的library相对较少,而下面这种更为常见:



    主要是“1+-,1-+,2++,2--”这种,也就是read1在+链,相对的gene其实是在-链(reverse)。这种就是“fr-firststrand”,也就是参数中的--rf。

    同样两种在0.5附近的是non-specific:

    infer_experiment.py -r hg19.refseq.bed12 -i Pairend_nonStrandSpecific_36mer_Human_hg19.bam
    
    #Output::
    
    This is PairEnd Data
    Fraction of reads failed to determine: 0.0172
    Fraction of reads explained by "1++,1--,2+-,2-+": 0.4903
    Fraction of reads explained by "1+-,1-+,2++,2--": 0.4925
    

    判断所需要的refseq文件可以在说明页面找到下载:


    参考:

    1. http://rseqc.sourceforge.net/#download
    2. https://www.biostars.org/p/295344/

    欢迎关注~

    相关文章

      网友评论

        本文标题:RSeQC判断链特异性(strand-specific)

        本文链接:https://www.haomeiwen.com/subject/hrijlktx.html