美文网首页生信蛋糕一口一口吃转录组分析转录组
一个超简单的转录组项目全过程--iMac+RNA-Seq(四)f

一个超简单的转录组项目全过程--iMac+RNA-Seq(四)f

作者: 冻春卷 | 来源:发表于2019-04-30 09:19 被阅读14次

    好久没更新啦,因为最近没有在跑项目哦,所以就搁置了,最近要做一两张图,所以把前面写的教程先发布一下~

    前期文章

    一个超简单的转录组项目全过程--iMac+RNA-Seq(一)
    一个超简单的转录组项目全过程--iMac+RNA-Seq(二)QC
    一个超简单的转录组项目全过程--iMac+RNA-Seq(三)Alignment 比对

    4 featureCounts

    比对结束后需要进行计数咯!

    老规矩,先看说明书。

    featureCounts -h ## 好像发现了什么不得了的
    $featureCounts -h
    featureCounts: invalid option -h
    

    Version 1.6.2

    Usage: featureCounts [options] -a <annotation_file> -o <output_file> input_file1 [input_file2] ...

    -T number of threads

    -p If specified, fragments (or templates) will be counted instead of reads. This option is only applicable for paired-end reads.

    -t <string> Specify feature type in GTF annotation. 'exon' by default. Features used for read counting will be extracted from annotation using the provided value.

    -g <string> Specify attribute type in GTF annotation. 'gene_id' by default. Meta-features used for read counting will be extracted from annotation using the provided value.

    -a <string> Name of an annotation file. GTF/GFF format by default. See -F option for more format information. Inbuilt annotations (SAF format) is available in 'annotation' directory of the package. Gzipped file is also accepted.

    -o <string> Name of the output file including read counts. A separate file including summary statistics of counting results is also included in the output ('<string>.summary')

    用法一

    ## 教程代码,使用for循环
    mkdir $wkd/align   #这样会生成单个的,每个都有自己的名字
    cd $wkd/align 
    
    source activate rna 
    for fn in {512..520} 
    do 
    featureCounts -T 4 -p -t exon -g gene_id \
    -a /Users/bioinformatic/reference/hg38/hg38.gtf \
    -o counts.SRR1039$fn.txt /home/rna/clean/SRR1039$fn.hisat.bam 
    done 
    source deactivate 
    
    

    用法二:酷酷的

    vim count.sh ## 创建一个脚本叫做count
    
    #!/bin/bash
    set -e
    set -u
    set -o pipefail
    
    # set PATH 设置路径
    HUMAN=/Users/bioinformatic/reference/hg38
    OUTDIR=/Users/Desktop/project/clean/align
    
    # source activate rna
    cd $OUTDIR  # 打开目标文件夹
    
    pwd  ## 显示当前路径
    
    ls *bam | cut -d"_" -f 1| sort -u |\
    while read id 
    do   # 用echo语句来提示脚本运行进度
         echo "processing  ${id}_combined.hisat.bam" 
         if [ ! -f ${id}.hisat.done ] 
         then
         featureCounts -T 4 -p \
             -t exon -g gene_id -a $HUMAN/hg38.gtf \
             -o counts.${id}.txt \
             $OUTDIR/${id}_combined.hisat.bam && touch ${id}.hisat.done
          fi
    
    done
    
    # source deactivate 
    

    Note:脚本很长的时候,记得使用“\”符号加回车,让行文变得整洁易读。

    运行脚本

    nohup bash count.sh &

    查看结果

    cat nohup.out就可以看到featureCounts的运行日志,请注意以下两个数据

    26834745 (92.89%) aligned concordantly exactly 1 time
    97.89% overall alignment rate
    

    小结:本章所需知识背景和技能

    (1)基因表达量的计算到底在算什么?

    (2)TPM, RPKM, FPKM这几个计量方式的区别

    (3)学会查看帮助文档,了解软件用法

    (4)nohup是什么,nohup和&的区别等等

    用法二的知识点

    在酷酷的用法二里面有几个很酷的地方:

    #!/bin/bash
    set -e
    set -u
    set -o pipefail
    
    # touch的用法
    
    # if then fi 语句的使用
    

    shell脚本和正则表达式:必学

    相关文章

      网友评论

        本文标题:一个超简单的转录组项目全过程--iMac+RNA-Seq(四)f

        本文链接:https://www.haomeiwen.com/subject/rigynqtx.html