RNAseq分析流程-Hisat2+Samtools+Strin

作者: 生信编程日常 | 来源:发表于2019-12-31 23:53 被阅读0次

首先，分析RNAseq要对整个分析流程有个整体的了解：
参考https://tiramisutes.github.io/2018/12/04/ref-RNA-seq.html
详细介绍了主要用到的分析工具和流程。这里我主要介绍一下我常用的分析流程
拿到原始数据首先需要对文件完整性进行检查

md5sum file #输出的MD5和测序公司给的进行对比，一样则说明文件完整
#或者，如果公司给出了md5.txt，md5sum -c 
#  -c 根据已生成的md5值，对现存文件进行校验
md5sum -c md5.txt#文件完整的会输出ok

文件完整之后要进行质控，用到fastqc，或者multiqc

 fastqc -o fastqc_out/ -t 10 *.clean.fq.gz

然后再是比对的流程

hisat2 -p 20 --dta -x hisat2_index/hg38 -1 clean/sample_1.clean.fq.gz -2 clean/sample_2.clean.fq.gz -S align.sam/sample.align.sam > align.log/sample.align.log 2>&1

        #samfiles sorting
        samtools sort -@ 20 -o bam/sample.bam align.sam/sample.align.sam

        #stringtie processing
        stringtie -e -p 20 -G reference/annotation/human/hg38/gencode.v29.annotation.gtf -o stringtie_out/sample/sample.gtf bam/sample.bam

因为我有多个fastq文件，每个都写太慢了，一般会用一些shell命令直接将文件批量在for sample in 你的文件名,如我的fastq文件都在clean文件夹下，clean下面有4个文件夹control1,control2,treat1,treat2，每个文件夹下面又有双端测序的fastq.gz文件，如control1_1.clean.fq.gz，control1_2.clean.fq.gz则

echo "start! \n"

for sample in `du -sh clean/* |cut -d '/' -f2`;#取clean下面的文件夹的名字，这个也可以自己写，如for sample in control1 control2 treat1 treat2;
do

        #fastqc 质控这个一般公司给的数据会给这个结果，所以可选
        #fastqc -o fastqc_out/ -t 10 clean/${sample}/*.clean.fq.gz

        #hisat mapping
        hisat2 -p 20 --dta -x reference/index/hg38/hisat2_index/hg38 -1 clean/${sample}/${sample}_1.clean.fq.gz -2 clean/${sample}/${sample}_2.clean.fq.gz -S align.sam/${sample}.align.sam > align.log/${sample}.align.log 2>&1

        #samfiles sorting
        samtools sort -@ 20 -o bam/${sample}.bam align.sam/${sample}.align.sam

        #stringtie processing
        stringtie -e -p 20 -G reference/annotation/human/hg38/gencode.v29.annotation.gtf -o stringtie_out/${sample}/${sample}.gtf bam/${sample}.bam
done

python prepDE.py -i stringtie_out -g gene_count_matrix.csv -t gene_transcript_matrix.csv

echo "end!"

需要注意的是reference/index/hg38/hisat2_index/hg38这里的hisat2需要的index需要自己建立，stringtie的annotation需要GENCODE上下载

#建立human的参考基因组：human或者小鼠的genome.fa可以UCSC ，Ensembl下载
hisat2-build –p 8 genome.fa genome

网友评论

本文标题：RNAseq分析流程-Hisat2+Samtools+Strin

本文链接：https://www.haomeiwen.com/subject/bkaboctx.html

延伸阅读

深度阅读

您也可以注册成为美文阅读网的作者，发表您的原创作品、分享您的心情！

RNAseq分析流程-Hisat2+Samtools+Strin

相关文章

网友评论

延伸阅读

深度阅读

栏目导航

热点阅读

生物信息学

RNA-seq转录组分析