Smart-seq2上游处理(Hisat2版)

作者: Insc | 来源:发表于2022-05-09 12:21 被阅读0次

    最近试了很多Smart-seq2上游处理的方法,参照了Jimmy老师的教程,以及阿则的生信小站公众号文章。对代码进行了整理。但是过程中仍然有一些问题尚待解决。

    1. 数据、软件准备

    • SRR文件下载到SRRfile文件下
    mkdir SRRfile
    prefetch --option-file SRR_Acc_List.txt
    
    • hisat2下载
    conda install hisat2
    
    • hisat2比对索引下载
    axel https://genome-idx.s3.amazonaws.com/hisat/grch38_tran.tar.gz
    # wget https://genome-idx.s3.amazonaws.com/hisat/grch38_tran.tar.gz
    
    • conda安装其他软件
    conda install fastqc multiqc samtools sambamba featureCounts
    

    2. 数据处理

    • 解压缩
    gunzip *.gz
    
    • SRA文件转fastq
    mkdir fastq
    ls SRRfile/SRR* | while read id;do \
        (fastq-dump --gzip --split-3 -A `basename $id` -O fastq/$id &);done
    
    • 质控
    mkdir fastqc
    fastqc -o ./fastqc -t 20 ./fastq/*.fastq &&multiqc ./fastqc -o ./fastqc
    
    • 去接头
    # trim_galore
    mkdir clean
    cat SRR_Acc_List.txt | while read id;do\
      trim_galore --quality 20 --phred33 \
      --length 20 -j 20 \
      -o ../clean \
      --paried ./fastq/${id}_1.fastq ./fastq/${id}_2.fastq
    done
    gunzip ./clean/*.gz
    

    3. 序列比对

    • hisat2
    mkdir aligned
    cat SRR_Acc_List.txt | while read id;do
      hisat2 -p 20 -x ~/ref/genome_tran/genome_tran \
      -S ./aligned/${id}.sam \
     -1./clean/${id}_1_val_1.fq -2 ./clean/${id}_2_val_2.fq
    done
    
    
    • sam转bam
    mkdir bam
    cat SRR_Acc_List.txt | while read id;do
      samtools sort -n -@ 20 \
      -o ./bam/${id}.bam ./aligned/${id}.sam
    done
    
    • bam文件排序
    cat SRR_Acc_List.txt | while read id;do
      sambamba sort -t 20 -o ./bam/${id}.bam ./bam/${id}.bam
    done
    
    • 输出为count
    mkdir count
    cat SRR_Acc_List.txt | while read id;do
      featureCounts -T 20 -p -t exon -g gene_name \
      -a ~/ref/gencode.v40.annotation.gtf \
      -o count/${id}.all_feature.txt \
      bam/${id}.sorted.bam
    done
    

    4.批处理文件

    根据以上,写成批处理文件,批量处理SRR格式文件。

    ### srr2fastq
    mkdir fastq
    ls SRRfile/SRR* | while read id;do \
        (fastq-dump --gzip --split-3 -A `basename $id` -O fastq/$id &);done
    
    ### fastqc
    mkdir fastqc
    fastqc -o ./fastqc -t 20 ./fastq/*.fastq &&multiqc ./fastqc -o ./fastqc
    
    ### trim_galore
    mkdir clean
    cat SRR_Acc_List.txt | while read id;do\
      trim_galore --quality 20 --phred33 \
      --length 20 -j 20 \
      -o ../clean \
      --paried ./fastq/${id}_1.fastq ./fastq/${id}_2.fastq
    done
    
    ### hisat2
    mkdir aligned
    cat SRR_Acc_List.txt | while read id;do
      hisat2 -p 20 -x ~/ref/genome_tran/genome_tran \
      -S ./aligned/${id}.sam \
     -1./clean/${id}_1_val_1.fq -2 ./clean/${id}_2_val_2.fq
    done
    
    ### sam转bam
    mkdir bam
    cat SRR_Acc_List.txt | while read id;do
      samtools sort -n -@ 20 \
      -o ./bam/${id}.bam ./aligned/${id}.sam
    done
    
    ### bam文件排序
    cat SRR_Acc_List.txt | while read id;do
      sambamba sort -t 20 -o ./bam/${id}.bam ./bam/${id}.bam
    done
    
    ### featurecounts 
    mkdir count
    cat SRR_Acc_List.txt | while read id;do
      featureCounts -T 20 -p -t exon -g gene_name \
      -a ~/ref/gencode.v40.annotation.gtf \
      -o count/${id}.all_feature.txt \
      bam/${id}.sorted.bam
    done
    
    
    ### 
    echo "------- end -------"
    

    参考资料

    1. Jimmy老师的教程
    2. 阿则的生信小站公众号文章

    相关文章

      网友评论

        本文标题:Smart-seq2上游处理(Hisat2版)

        本文链接:https://www.haomeiwen.com/subject/jmucurtx.html